emkornfield commented on issue #1452:
URL: https://github.com/apache/parquet-java/issues/1452#issuecomment-2271914678

   > (i.e. if user wants to filter to size(col, eq(0)), we can pass/fail the 
row groups if there are any values present), 
   
   I don't think this is accurate.  Because you can have lists that contain 
only null elements (e.g. `[null, null]`) so we couldn't say for sure no values 
present means that lists must be of size zero, and  if there is a value present 
we can't say for sure there are lists with no values. As Gang noted you can 
make these determinations with the repetition/definition levels.  
   
   I think we make the following conclusions making the repetition and 
definitional levels (assuming 3 level list encoding).
   1.  If all repetition levels are zero and all definitions levels are 0, then 
all lists are null (note: parquet i believe tends to conflate empty and null 
lists, so it would be good to clarify intended semantics)
   1.  If all repetition levels are zero and all definitions levels are 1, then 
all lists are of size zero.
   1. If all repetition levels are zero and all definitions level are > 
MAX_DEFINITION_LEVEL - 1 , all lists are of size 1.
   1. If count of MAX_REPETITION_LEVEL  < X then all all lists are have size <= 
X - 1 (in practice this might not really filter out too much).
   
   This logic could be extended for intermediate lists.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to