Re: [PR] PARQUET-2474: Add FIXED_SIZE_LIST logical type [parquet-format]

via GitHub Tue, 22 Oct 2024 09:28:01 -0700


tustvold commented on PR #241:
URL: https://github.com/apache/parquet-format/pull/241#issuecomment-2429736936


   Some points in no particular order:
   
   * The parquet schema is authoritative, with any other schema information 
merely a hint, this makes the notion of using the arrow schema, or something 
else to drive decode a little dubious.
   * The record shredding logic for lists is the single most complex, confusing 
and subtle aspect of any parquet reader, which:
     *  Limits the pool of people who can implement / review such changes
     * Sets a very high bar for including such changes
   * Even some optimal record shredding setup will never perform better than an 
implementation that can simply skip it entirely
   * Both arrow-rs and polars exploit that the hybrid RLE is effectively a 
bitmask if the max definition level is only 1, this allows for very efficient 
decode. This isn't possible when there are repetition levels
   * Performant record skipping, e.g. for predicate/index pushdown or late 
materialization, is not really possible against data with repetition levels
   * Many readers have quirky support for repetition levels and lists in 
general, especially w.r.t areas where the specification has been ambiguous in 
the past, finding ways for people to avoid these pain points is potentially 
valuable
   
   That's all to say providing a way to encode fixed size lists seems like a 
very useful capability. That being said, it does seem to be a bit of a hack to 
make this a logical type, and will potentially limit the options for encodings, 
statistics, sort orders, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] PARQUET-2474: Add FIXED_SIZE_LIST logical type [parquet-format]

Reply via email to