[ 
https://issues.apache.org/jira/browse/BEAM-6713?focusedWorklogId=207950&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207950
 ]

ASF GitHub Bot logged work on BEAM-6713:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Mar/19 18:24
            Start Date: 05/Mar/19 18:24
    Worklog Time Spent: 10m 
      Work Description: chamikaramj commented on pull request #7893: 
[BEAM-6713] Add withMaxNumWritersPerBundle from WriteFiles to FileIO …
URL: https://github.com/apache/beam/pull/7893#discussion_r262623683
 
 

 ##########
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
 ##########
 @@ -431,7 +431,8 @@
         .setNumShards(0)
         .setCodec(TypedWrite.DEFAULT_SERIALIZABLE_CODEC)
         .setMetadata(ImmutableMap.of())
-        .setWindowedWrites(false);
+        .setWindowedWrites(false)
+        
.setMaxNumWritersPerBundle(WriteFiles.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE);
 
 Review comment:
   Beam generally tries to minimize knobs and let the runners perform decisions 
related bundle size, dynamic splitting etc. I'm afraid exposing knobs like this 
though the interface of IO connectors might promote patters that will result in 
runners being less-efficient in the future. And once we expose a knob we cannot 
take it back.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 207950)
    Time Spent: 1h 10m  (was: 1h)

> FileIO and TextIO unable to alter WriteFiles maxNumWritersPerBundle
> -------------------------------------------------------------------
>
>                 Key: BEAM-6713
>                 URL: https://issues.apache.org/jira/browse/BEAM-6713
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Kyle Winkelman
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When attempting to run a batch workflow with a FileIO.write() I was getting 
> job failures due to WriteFiles.DEFAULT_MAX_NUM_WRITERS_PER_BUNDLE causing a 
> significant amount of data to be shuffled. My issues would be solved by 
> increasing this and luckily WriteFiles already has withMaxNumWritersPerBundle 
> but unfortunately FileIO and TextIO do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to