If we go with any option that restricts the number of outputs then in the example we should discuss what it does and why it is not considered a good thing.
On Tue, Jul 12, 2016 at 2:11 AM, Amit Sela (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/BEAM-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372225#comment-15372225 > ] > > Amit Sela commented on BEAM-434: > -------------------------------- > > I sort of prefer 2, but by letting the user pass the numShards > configuration (which may need a better name) > Like I mentioned in the PR, if we want to give a simple example result on > one hand, while keeping in the user's mind the fact that multiple shards > are a thing to consider, we could add a --numShards option and add it to > the examples code with a default of 1 (or 3). > If we want the users to know about multiple output shards, why should we > keep the examples "pure" ? > > How about adding an option named "--numOutputShards" with default value 1 > (or 3, I could live with 3 :) ) and adding this to the examples README, > thus giving a better experience in terms of "seeing" the output, while > keeping the multiple-shards "on the table" and as a bonus, the Travis CI > tests could still run with as many shards as we want (while I wanted > examples to be easy enough, I definitely didn't want that for Travis!) > > WDYT ? > > > > When examples write output to file it creates many output files instead > of one > > > ------------------------------------------------------------------------------ > > > > Key: BEAM-434 > > URL: https://issues.apache.org/jira/browse/BEAM-434 > > Project: Beam > > Issue Type: Bug > > Components: examples-java > > Reporter: Amit Sela > > Assignee: Amit Sela > > Priority: Minor > > > > When using `TextIO.Write.to("/path/to/output")` without any > restrictions on the number of shards, it might generate many output files > (depending on your input), for WordCount for example, you'll get as many > output files as unique words in your input. > > Since I think examples are expected to execute in a friendly manner to > "see" what it does and not optimize for performance in some way, I suggest > to use `withoutSharding()` when writing the example output to an output > file. > > Examples I could find that behave this way: > > org.apache.beam.examples.WordCount > > org.apache.beam.examples.complete.TfIdf > > org.apache.beam.examples.cookbook.DeDupExample > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
