[ 
https://issues.apache.org/jira/browse/ORC-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799307#comment-17799307
 ] 

Alexander Petrossian (PAF) commented on ORC-1553:
-------------------------------------------------

Looking at row group statistics with orc-tools shows something unusual too. 
Statistics is there, but no min/max/sum info available.

{noformat}
orc-tools meta -r 384 MAJOR-2023-11-21.orc
    Row group indices for column 384:
      Entry 0: count: 4 hasNull: false min: 250997347833691 max: 79689367592 
sum: 52 positions: 0,0,0,0,0
      Entry 1: count: 4 hasNull: false min: 250997347833691 max: 79689367592 
sum: 52 positions: 0,52,0,0,4
      Entry 2: count: 2 hasNull: false min: 250997347833691 max: 79670734942 
sum: 26 positions: 0,104,0,0,8
      Entry 3: count: 0 hasNull: false positions: 0,130,0,0,10
      Entry 4: count: 2 hasNull: false min: 250997350144447 max: 79689367592 
sum: 26 positions: 0,130,0,0,10
      Entry 5: count: 2 hasNull: false min: 250997347833691 max: 79670734942 
sum: 26 positions: 0,156,0,0,12
      Entry 6: count: 4 hasNull: false min: 250997347833691 max: 79689367592 
sum: 52 positions: 0,182,0,0,14
      Entry 7: count: 2 hasNull: false min: 250997350144447 max: 79689367592 
sum: 26 positions: 0,234,0,0,18
      Entry 8: count: 0 hasNull: false positions: 0,260,0,0,20
      Entry 9: count: 2 hasNull: false min: 250997347833691 max: 79670734942 
sum: 26 positions: 0,260,0,0,20
      Entry 10: count: 8 hasNull: false min: 250997350144447 max: 79689367592 
sum: 104 positions: 0,286,0,0,22
      Entry 11: count: 2 hasNull: false min: 250997347833691 max: 79670734942 
sum: 26 positions: 0,390,0,0,30
      Entry 12: count: 10 hasNull: false min: 250997347833691 max: 79689367592 
sum: 130 positions: 0,416,0,0,32
      Entry 13: count: 4 hasNull: false min: 250997347833691 max: 79689367592 
sum: 52 positions: 0,546,0,0,42
      Entry 14: count: 14 hasNull: false min: 250997347833691 max: 79689367592 
sum: 182 positions: 0,598,0,0,46
      Entry 15: count: 14 hasNull: false min: 250997347833691 max: 79689367592 
sum: 182 positions: 0,780,0,0,60
      Entry 16: count: 4 hasNull: false min: 250997347833691 max: 79689367592 
sum: 52 positions: 0,962,0,0,74
      Entry 17: count: 0 hasNull: false positions: 0,1014,0,0,78
      Entry 18: count: 6 hasNull: false min: 250997347833691 max: 79689367592 
sum: 78 positions: 0,1014,0,0,78
      Entry 19: count: 0 hasNull: false positions: 0,1092,0,0,84
 {noformat}

Hope I'm not bugging you. Trying to be helpful ;)

> Reading information from Row group, where there are 0 records of SArg column
> ----------------------------------------------------------------------------
>
>                 Key: ORC-1553
>                 URL: https://issues.apache.org/jira/browse/ORC-1553
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 1.9.2
>            Reporter: Alexander Petrossian (PAF)
>            Priority: Major
>         Attachments: MAJOR-2023-11-21.orc, Снимок экрана 2023-12-21 в 
> 10.00.23.png
>
>
> We have created .orc file using Apache ORC library, I can not provide a 
> reproducible way to create such a file.
> We have statistics for 100% row groups, checked with orc dump.
> But I see that when we search by that file we get a very strange behavior:
> {code}
> TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
> stringStatistics {
> }
> hasNull: false
> TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 
> 71231231212) to YES_NO_NULL
> DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is 
> included.
> {code}
> If there are 0 values according to existing statistics, so there is obviously 
> no need to read that row group.
> And yet we have YES_NO_NULL decision which forces inclusion of that row group 
> in subsequent operation, which meaningless and bad for performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to