[ 
https://issues.apache.org/jira/browse/HIVE-24710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-24710:
------------------------------------
    Description: 
PTFRowContainer could be reading the same block repeatedly for the first block. 
Default block size is around 25000. For the first 25000 rowIdx, it would read 
the block repeatedly due to ("rowIdx < currentReadBlockStartRow ") condition.

{noformat}
 public Row getAt(int rowIdx) throws HiveException {
    int blockSize = getBlockSize();
    if ( rowIdx < currentReadBlockStartRow || rowIdx >= 
currentReadBlockStartRow + blockSize ) {
      readBlock(getBlockNum(rowIdx));
    }
    return getReadBlockRow(rowIdx - currentReadBlockStartRow);
  }
{noformat} 

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java#L167

 

  was:
PTFRowContainer could be reading the same block repeatedly for the first block. 
Default block size is around 25000. For the first 25000 rowIdx, it would read 
the block repeatedly due to ("rowIdx < currentReadBlockStartRow ") condition.

{noformat}
 public Row getAt(int rowIdx) throws HiveException {
    int blockSize = getBlockSize();
    if ( rowIdx < currentReadBlockStartRow || rowIdx >= 
currentReadBlockStartRow + blockSize ) {
      readBlock(getBlockNum(rowIdx));
    }
    return getReadBlockRow(rowIdx - currentReadBlockStartRow);
  }
{noformat} 

 


> PTFRowContainer could be reading more number of blocks than needed
> ------------------------------------------------------------------
>
>                 Key: HIVE-24710
>                 URL: https://issues.apache.org/jira/browse/HIVE-24710
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Rajesh Balamohan
>            Priority: Major
>              Labels: performance
>
> PTFRowContainer could be reading the same block repeatedly for the first 
> block. Default block size is around 25000. For the first 25000 rowIdx, it 
> would read the block repeatedly due to ("rowIdx < currentReadBlockStartRow ") 
> condition.
> {noformat}
>  public Row getAt(int rowIdx) throws HiveException {
>     int blockSize = getBlockSize();
>     if ( rowIdx < currentReadBlockStartRow || rowIdx >= 
> currentReadBlockStartRow + blockSize ) {
>       readBlock(getBlockNum(rowIdx));
>     }
>     return getReadBlockRow(rowIdx - currentReadBlockStartRow);
>   }
> {noformat} 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java#L167
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to