[ 
https://issues.apache.org/jira/browse/BEAM-12685?focusedWorklogId=633784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633784
 ]

ASF GitHub Bot logged work on BEAM-12685:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Aug/21 19:38
            Start Date: 04/Aug/21 19:38
    Worklog Time Spent: 10m 
      Work Description: dhercher commented on pull request #15246:
URL: https://github.com/apache/beam/pull/15246#issuecomment-892921897


   We could remove the ReShuffle entirely?  There isn't a clear reason to 
change whichever setting were set upstream or a reason why the reshuffle adds 
much value.
   
   More than anything it causes a forced commit if you are reading Pub/Sub and 
removes a grouping which was set on purpose?  It seems like there isn't a 
strong reason to shuffle here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 633784)
    Remaining Estimate: 1h 10m  (was: 1h 20m)
            Time Spent: 50m  (was: 40m)

> Allow managed thread count in AvroIO
> ------------------------------------
>
>                 Key: BEAM-12685
>                 URL: https://issues.apache.org/jira/browse/BEAM-12685
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-files
>            Reporter: Dylan Hercher
>            Priority: P2
>   Original Estimate: 2h
>          Time Spent: 50m
>  Remaining Estimate: 1h 10m
>
> During execution, the `ReadAllViaFileBasedSource` runs ReShuffle and creates 
> an un-grouped set of file range readers.  This can easily cause OOM issues 
> when the number of groups changes as there is no limit to the number of 
> concurrent file reads.
>  
> Using Reshuffle.viaRandomKeys.withNumBuckets instead will allow the same 
> default behavior, but lets the user configure the number of readers as and 
> when needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to