adampinky85 commented on issue #43682: URL: https://github.com/apache/arrow/issues/43682#issuecomment-2309263837
Hi @mapleFU, with the example code above, using `pyarrow.parquet.read_table` doesn't work for our needs as; 1. users would need to use pyarrow instead of pandas; 2. users would need to be aware of the dictionary columns fields for each file (should be part of the schema); and 3. the data is not stored in the format we require i.e. dictionary int fields rather than compression string encoding. We want to be able to build parquet files with the dictionary fields (which works for the non streaming C++ API) and utilise the stream C++ stream writer. thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
