alamb commented on issue #46404:
URL: https://github.com/apache/arrow/issues/46404#issuecomment-2873252315

   > Ok, I looked at this a bit and the problem is that the file contains a 
data page with a header larger than 16 MB. (probably because non-truncated 
statistics are written out). While technically valid, this is not desirable and 
arrow-rs should probably avoid producing such files. @alamb 
   
   > So you think setting 
[this](https://arrow.apache.org/rust/parquet/file/properties/struct.WriterPropertiesBuilder.html#method.set_statistics_truncate_length)
 parameter to something below 16MiB should solve the issue? (if yes, probably 
this should be the default on the arrow-rs side, right?)
   
   I think you should set that parameter to something much lower like 128 as it 
refers to the maximum size of each value, not the size of the entire page  
(which is what it sounds like parquet-cpp is limited with 16MB).
   
   I did notice that the default value is not well documented and the default 
is actually `None` (no truncation). I will file a ticket to consider a 
different 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to