[ 
https://issues.apache.org/jira/browse/ORC-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799302#comment-17799302
 ] 

Alexander Petrossian (PAF) commented on ORC-1553:
-------------------------------------------------

Hurray, found small enough file!
 [^MAJOR-2023-11-21.orc] 

{noformat}
stripe = {ReaderImpl$StripeInformationImpl@2378} "offset: 3 data: 16260636 
rows: 190744 tail: 10482 index: 86797"
columnIx = 384

entry = {OrcProto$RowIndexEntry@2521} "positions: 0\npositions: 130\npositions: 
0\npositions: 0\npositions: 10\nstatistics {\n  numberOfValues: 0\n  
stringStatistics {\n  }\n  hasNull: false\n}\n"
stats = {OrcProto$ColumnStatistics@2522} "numberOfValues: 0\nstringStatistics 
{\n}\nhasNull: false\n"
 bitField0_ = 521
 numberOfValues_ = 0
 intStatistics_ = null
 doubleStatistics_ = null
 stringStatistics_ = {OrcProto$StringStatistics@2548} ""
  bitField0_ = 0
  minimum_ = ""
  maximum_ = ""
  sum_ = 0
  lowerBound_ = ""
  upperBound_ = ""
  memoizedIsInitialized = -1
  unknownFields = {UnknownFieldSet@2549} ""
  memoizedSize = -1
  memoizedHashCode = 0
 bucketStatistics_ = null
 decimalStatistics_ = null
 dateStatistics_ = null
 binaryStatistics_ = null
 timestampStatistics_ = null
 hasNull_ = false
 bytesOnDisk_ = 0
 collectionStatistics_ = null
 memoizedIsInitialized = -1
 unknownFields = {UnknownFieldSet@2549} ""
 memoizedSize = -1
 memoizedHashCode = 0
{noformat}

{noformat}
TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
stringStatistics {
}
hasNull: false

TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS 
data.request.eventItem._elem.UsageEventItem.usage.CustomerFacingServiceUsage.relatedParty._elem.resource._elem.value
 71231231212) to YES_NO_NULL
DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 30000 to 39999 is 
included.
{noformat}

> Reading information from Row group, where there are 0 records of SArg column
> ----------------------------------------------------------------------------
>
>                 Key: ORC-1553
>                 URL: https://issues.apache.org/jira/browse/ORC-1553
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 1.9.2
>            Reporter: Alexander Petrossian (PAF)
>            Priority: Major
>         Attachments: MAJOR-2023-11-21.orc, Снимок экрана 2023-12-21 в 
> 10.00.23.png
>
>
> We have created .orc file using Apache ORC library, I can not provide a 
> reproducible way to create such a file.
> We have statistics for 100% row groups, checked with orc dump.
> But I see that when we search by that file we get a very strange behavior:
> {code}
> TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
> stringStatistics {
> }
> hasNull: false
> TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 
> 71231231212) to YES_NO_NULL
> DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is 
> included.
> {code}
> If there are 0 values according to existing statistics, so there is obviously 
> no need to read that row group.
> And yet we have YES_NO_NULL decision which forces inclusion of that row group 
> in subsequent operation, which meaningless and bad for performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to