Alexander Petrossian (PAF) created ORC-1553:
-----------------------------------------------

             Summary: Reading information from Row group, where there are 0 
records of SArg column
                 Key: ORC-1553
                 URL: https://issues.apache.org/jira/browse/ORC-1553
             Project: ORC
          Issue Type: Bug
    Affects Versions: 1.9.2
            Reporter: Alexander Petrossian (PAF)


We have created .orc file using Apache ORC library, I can not provide a 
reproducible way to create such a file.
We have statistics for 100% row groups, checked with orc dump.

But I see that when we search by that file we get a very strange behavior:

{code}
TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
stringStatistics {
}
hasNull: false

TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 71231231212) 
to YES_NO_NULL
DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is 
included.
{code}

If there are 0 values according to existing statistics, so there is obviously 
no need to read that row group.

And yet we have YES_NO_NULL decision which forces inclusion of that row group 
in subsequent operation, which meaningless and bad for performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to