shuai-xu opened a new pull request, #2055:
URL: https://github.com/apache/orc/pull/2055

   
   ### What changes were proposed in this pull request?
   This pr fix the bug that if the column statistics in a orc file is not fully 
written, and lack of hasnull field, user may get a wrong result using c++ to 
read it.
   For example, a file struct<string col1, string col2>, has 10 lines, col1 all 
has value, col2 all is null. the column 1's stat written by trino may be 
   numberOfValues: 10
   stringStatistics {
     minimum: "10"
     maximum: "100"
     sum: 565
   }. col2's stat is  numberOfValues: 0. They all have no hasnull field. When 
we want to get where col2 is null, we will get nothing.
   
   
   ### Why are the changes needed?
   User may get a wrong result with this bug.
   
   
   ### How was this patch tested?
   Add unit tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to