clairemcginty opened a new pull request, #3098:
URL: https://github.com/apache/parquet-java/pull/3098
### Rationale for this change
this PR continues the work outlined in #1452. It implements a `size()`
predicate for filtering on # of elements in repeated fields:
```java
FilterPredicate hasThreeElements = size(intColumn("my_list_field"),
Operators.Size.Operator.EQ, 3)
```
### What changes are included in this PR?
`Size()` and `not(size())` implemented for all list fields with **`required`
element type**. Attempting to filter on a list of optional elements will throw
an exception in the schema validator. This is because the existing record-level
filtering setup (`IncrementallyUpdatedFilterPredicateEvaluator`) only feeds in
non-null values to the `ValueInspectors`. thus if you had an array [1,2, null,
4] it would only count 3 elements. I can file a ticket to support this
eventually but I think we'd have to rework the FilteringRecordMaterializer to
be aware of repetition/definition levels.
The list group itself can be `optional` or `required`. Null lists are
treated as having size 0. Again, this is due to difficulty disambiguating them
at the record-level filtering step. (Would love feedback on both these design
decisions!!)
### Are these changes tested?
Unit tests + tested a snapshot build locally with real datasets
### Are there any user-facing changes?
New Operators API
<!-- Please uncomment the line below and replace ${GITHUB_ISSUE_ID} with the
actual Github issue id. -->
Part of #1452
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]