[ 
https://issues.apache.org/jira/browse/BEAM-12434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Sternby updated BEAM-12434:
---------------------------------
    Resolution: Implemented
        Status: Resolved  (was: Triage Needed)

> implement num_shard side_input to WriteToTFRecord
> -------------------------------------------------
>
>                 Key: BEAM-12434
>                 URL: https://issues.apache.org/jira/browse/BEAM-12434
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-py-tfrecord
>    Affects Versions: 3.0.0, 2.29.0, 2.30.0, 2.31.0, 2.32.0
>            Reporter: Johan Sternby
>            Assignee: Johan Sternby
>            Priority: P2
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> {{As concisely explained in 
> [https://stackoverflow.com/questions/49156159/can-i-pass-side-inputs-to-apache-beam-ptransforms|http://example.com/]
>  }}
> EXAMPLES_PER_SHARD = 5.0
> num_tfexamples = tfexample_strs | "count tf examples" >> 
> beam.combiners.Count.Globally()
> num_shards = num_tfexamples | ("compute number of shards" >>
>                                beam.Map(lambda num_examples: 
> int(math.ceil(num_examples / EXAMPLES_PER_SHARD))))
> _ = tfexample_strs | ("output to tfrecords" >>
>                       beam.io.WriteToTFRecord(OUTPUT_DIR, 
> num_shards=beam.pvalue.AsSingleton(num_shards)))
> fails with
> File "/usr/local/lib/python3.7/dist-packages/apache_beam/io/iobase.py", line 
> 1011, in start_bundle
>     self.counter = random.randint(0, self.count - 1)
> TypeError: unsupported operand type(s) for -: 'AsSingleton' and 'int' [while 
> running 'output VALIDATION to 
> tfrecords/Write/WriteImpl/ParDo(_RoundRobinKeyFn)']
> WriteToTFRecords op in the python SDK of apache-beam does currently not 
> support side_input to num_shards.
> It can easily be solved by implementing the _RoundRobinKeyFn a bit 
> differently and calling the ParDo with side_input instead of class init 
> values. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to