Alexander Petrossian (PAF) created ORC-1553: -----------------------------------------------
Summary: Reading information from Row group, where there are 0 records of SArg column Key: ORC-1553 URL: https://issues.apache.org/jira/browse/ORC-1553 Project: ORC Issue Type: Bug Affects Versions: 1.9.2 Reporter: Alexander Petrossian (PAF) We have created .orc file using Apache ORC library, I can not provide a reproducible way to create such a file. We have statistics for 100% row groups, checked with orc dump. But I see that when we search by that file we get a very strange behavior: {code} TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0 stringStatistics { } hasNull: false TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 71231231212) to YES_NO_NULL DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is included. {code} If there are 0 values according to existing statistics, so there is obviously no need to read that row group. And yet we have YES_NO_NULL decision which forces inclusion of that row group in subsequent operation, which meaningless and bad for performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)