[ 
https://issues.apache.org/jira/browse/BEAM-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775352#comment-16775352
 ] 

Robert Burke commented on BEAM-4468:
------------------------------------

I've finally hit the point in my performance work where large and long enough 
jobs with poor enough key distributions renders this necessary. It required 
very high cardinality key spaces, on sufficient enough data.

> Go SDK-Tune in memory pre-combine caching for Lifted Combines.
> --------------------------------------------------------------
>
>                 Key: BEAM-4468
>                 URL: https://issues.apache.org/jira/browse/BEAM-4468
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: Minor
>
> Requires [BEAM-4276|https://issues.apache.org/jira/browse/BEAM-4276] to be 
> completed first.
> Additional performance tweaks to the in memory per-key accumulator cache 
> during the pre-combine phase of a lifted combine.
> This can include any of
>  * capping the number of key-accumulator in the cache, and draining them 
> eagerly after x elements seen, or evicting keys by some heuristic after the 
> cache has seen X distinct keys.
>  * providing a counter on cache size (key/element counts) exposable through 
> the metrics interface or another runner standard counter, to permit observing 
> the cache's status, especially if it could grow without bound within a bundle.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to