[
https://issues.apache.org/jira/browse/BEAM-12685?focusedWorklogId=633757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633757
]
ASF GitHub Bot logged work on BEAM-12685:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Aug/21 18:47
Start Date: 04/Aug/21 18:47
Worklog Time Spent: 10m
Work Description: pabloem commented on pull request #15246:
URL: https://github.com/apache/beam/pull/15246#issuecomment-892889266
hmmm I think we usually try to avoid giving this kind of knob to users,
because it's easy for these parameters to become obsolete, and hardcoded in
legacy pipelines.. I'm wondering - do you think there's another way of
accomplishing this?
In Dataflow we have GroupIntoBatches.withAutoSharding, which provides a
sharding parameter that scales automatically with the number of workers.
Perhaps this would help?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 633757)
Remaining Estimate: 1h 20m (was: 1.5h)
Time Spent: 40m (was: 0.5h)
> Allow managed thread count in AvroIO
> ------------------------------------
>
> Key: BEAM-12685
> URL: https://issues.apache.org/jira/browse/BEAM-12685
> Project: Beam
> Issue Type: Improvement
> Components: io-java-files
> Reporter: Dylan Hercher
> Priority: P2
> Original Estimate: 2h
> Time Spent: 40m
> Remaining Estimate: 1h 20m
>
> During execution, the `ReadAllViaFileBasedSource` runs ReShuffle and creates
> an un-grouped set of file range readers. This can easily cause OOM issues
> when the number of groups changes as there is no limit to the number of
> concurrent file reads.
>
> Using Reshuffle.viaRandomKeys.withNumBuckets instead will allow the same
> default behavior, but lets the user configure the number of readers as and
> when needed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)