rok commented on PR #241:
URL: https://github.com/apache/parquet-format/pull/241#issuecomment-3987279321

   > This is quite an old PR so I am reading earlier comments to understand 
what's going on. Please correct me if I am wrong anywhere or missed something.
   
   Thanks for taking the time!
    
   > This PR adds a new `FIXED_SIZE_LIST ` logical type containing `[num_values 
x FLBA element]` per such logical list. The underlying FLBA data would encode 
as non-nullable PLAIN (no repetition levels and no def levels so no nulls at 
any level?).
   
   The [num_values x FLBA] rows themselves would be nullable with definition 
levels. What we wouldn't have is intra-row nullability.
   
   > 1. Why not allow all or at least all fixed length physical types (INTs, 
FLOAT, DOUBLE, BOOLEAN etc) to be a part of `FIXED_SIZE_LIST` instead of just 
the `FIXED_LEN_BYTE_ARRAY`s type (do we only care for FLOAT16, DECIMALS, and 
UUIDs here)?
   
   FLBA is meant as physical type (container) for arbitrary 
FIXED_SIZE_LIST(type, size) logical type. All fixed length physical types are 
allowed.
   
   > 2. Why only PLAIN encoding allowed for the contained FLBA data? Is it to 
allow zero-copy?
   
   Encoding here is meant for byte layout within the FLBA. FIXED_SIZE_LIST 
columns can use any encoding that supports FLBA (plain, dictionary, 
delta_byte_array, byte_stream_split).
   
   > Consider all above is true, then from cuDF's perspective, we would 
definitely get some speed boost from not having to decode (and write) levels 
data for such types and then viewing the data as lists would also be trivial.
   > 
   > To that effect, if this type is added to Parquet, I would prefer if it 
supports more than just FLBAs to make the effort to support it worthwhile. 
CC'ing @pmattione-nvidia as he can speak more on the overhead incurred from 
decoding levels data in the last libcudf version.
   
   As stated above, this already supports all fixed-width primitive types as 
elements — FLBA is just the container. Glad to hear this would be useful for 
cuDF!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to