Re: [I] c++ Parquet StreamWriter Dictionary Encoding [arrow]

via GitHub Sun, 25 Aug 2024 21:06:11 -0700


adampinky85 commented on issue #43682:
URL: https://github.com/apache/arrow/issues/43682#issuecomment-2309263837


   Hi @mapleFU, with the example code above, using `pyarrow.parquet.read_table` 
doesn't work for our needs as; 1. users would need to use pyarrow instead of 
pandas; 2. users would need to be aware of the dictionary columns fields for 
each file (should be part of the schema); and 3. the data is not stored in the 
format we require i.e. dictionary int fields rather than compression string 
encoding.
   
   We want to be able to build parquet files with the dictionary fields (which 
works for the non streaming C++ API) and utilise the stream C++ stream writer. 
thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] c++ Parquet StreamWriter Dictionary Encoding [arrow]

Reply via email to