Re: [I] [Feature Request]: Allow customization of filename and sharding for dataframe IOs. [beam]

via GitHub Thu, 30 Nov 2023 00:28:45 -0800


jzxu commented on issue #22923:
URL: https://github.com/apache/beam/issues/22923#issuecomment-1833308759


   Hi, I noticed that despite https://github.com/apache/beam/pull/22925 being 
merged, DeferredDataFrame.to_csv() still doesn't respect the num_shards 
argument. Minimal test case:
   
   ```
   from typing import NamedTuple
   import apache_beam as beam
   from apache_beam.dataframe import convert
   
   class Row(NamedTuple):
     x: int
   
   with beam.Pipeline('DirectRunner') as p:
     c = (p | beam.Create([Row(x=i) for i in range(1000000)]))
     df = convert.to_dataframe(c)
     df.to_csv('/tmp/apache_beam_test.csv', index=False, num_shards=2)
   ```
   
   Running this with apache_beam 2.50.0 results in a single shard being written.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Feature Request]: Allow customization of filename and sharding for dataframe IOs. [beam]

Reply via email to