gszadovszky commented on pull request #855:
URL: https://github.com/apache/parquet-mr/pull/855#issuecomment-762721754


   Thanks a lot for working on this. It is a good catch!
   
   I've had to investigate a bit why we do not catch this in the unit tests. 
The answer is that in most cases a 
[NOOP](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/compat/FilterCompat.java#L62)
 filter is created instead of having a `null` value. We use this `NOOP` if no 
filter is specified in the builder/config. Meanwhile it is possible to set a 
`null` in the builder/config so the NPE may occur.
   I've also realized that in older filter implementations (row group level 
min/max, dictionary) the `NOOP` filter is fine and do not have significant 
performance costs over a `null` check. Meanwhile for column index of bloom 
filter based filtering do have performance costs because they read the related 
data from the file even for `NOOP`.
   
   If you don't mind, @wangyum, I would like to take care of this (including 
the NPE and the potential performance issues).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to