jzxu commented on issue #22923: URL: https://github.com/apache/beam/issues/22923#issuecomment-1833308759
Hi, I noticed that despite https://github.com/apache/beam/pull/22925 being merged, DeferredDataFrame.to_csv() still doesn't respect the num_shards argument. Minimal test case: ``` from typing import NamedTuple import apache_beam as beam from apache_beam.dataframe import convert class Row(NamedTuple): x: int with beam.Pipeline('DirectRunner') as p: c = (p | beam.Create([Row(x=i) for i in range(1000000)])) df = convert.to_dataframe(c) df.to_csv('/tmp/apache_beam_test.csv', index=False, num_shards=2) ``` Running this with apache_beam 2.50.0 results in a single shard being written. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
