emkornfield commented on issue #1452: URL: https://github.com/apache/parquet-java/issues/1452#issuecomment-2271914678
> (i.e. if user wants to filter to size(col, eq(0)), we can pass/fail the row groups if there are any values present), I don't think this is accurate. Because you can have lists that contain only null elements (e.g. `[null, null]`) so we couldn't say for sure no values present means that lists must be of size zero, and if there is a value present we can't say for sure there are lists with no values. As Gang noted you can make these determinations with the repetition/definition levels. I think we make the following conclusions making the repetition and definitional levels (assuming 3 level list encoding). 1. If all repetition levels are zero and all definitions levels are 0, then all lists are null (note: parquet i believe tends to conflate empty and null lists, so it would be good to clarify intended semantics) 1. If all repetition levels are zero and all definitions levels are 1, then all lists are of size zero. 1. If all repetition levels are zero and all definitions level are > MAX_DEFINITION_LEVEL - 1 , all lists are of size 1. 1. If count of MAX_REPETITION_LEVEL < X then all all lists are have size <= X - 1 (in practice this might not really filter out too much). This logic could be extended for intermediate lists. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
