arthurpassos commented on issue #32723:
URL: https://github.com/apache/arrow/issues/32723#issuecomment-1567327113

   > > 1. Yes, I think so.
   > > 2. I think 
[ArrowReaderPropertires](https://github.com/apache/arrow/blob/130f9e981aa98c25de5f5bfe55185db270cec313/cpp/src/parquet/properties.h#L778)
 is probably where this belongs.  For per column settings you can probably find 
inspiration from ParquetProperties (global might be fine for an initial 
implementation.
   > > 3. IIRC its not really memory limit as much as it is a limitation of the 
underlying address space of the Binary/String arrays which allow for at most 
2GB of data in a row group.  I don't recall the code well enough to know if 
there are other edge cases that you might encounter, but i think this would 
solve most issues.
   > 
   > Cool, thanks. I have updated the draft PR with some refactorings, but it's 
no longer working. I suspect it's related to the dictionary encondig / decoding 
classes, they seem to be hard-coded to `int`, which might not work for LARGE* 
variants. Do you know if it's necessary to have the 64 bit version of 
dictionaries?
   
   Enconding / decoding code is huge & somewhat complex, it would be great if I 
could skip changing that. Tons of changes and I am kind of afraid of 
introducing bugs..
   https://github.com/arthurpassos/arrow/blob/main/cpp/src/parquet/encoding.h
   https://github.com/arthurpassos/arrow/blob/main/cpp/src/parquet/encoding.cc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to