Vaibhav Gumashta created HIVE-21458:
---------------------------------------

             Summary: ACID: Optimize AcidUtils$MetaDataFile.isRawFormat check 
by caching the split reader
                 Key: HIVE-21458
                 URL: https://issues.apache.org/jira/browse/HIVE-21458
             Project: Hive
          Issue Type: Bug
          Components: Transactions
    Affects Versions: 3.1.1
            Reporter: Vaibhav Gumashta


In the transactional subsystems, in several places we check to see if a data 
file has ROW__ID fields or not. Every time we do that (even within the context 
of the same query), we open a Reader for that file/split. We could optimize 
this by caching. Also, perhaps we don't need to do this for every split. An 
example call stack:
{code}
OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105     
AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026     
AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022 
AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007      
OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, 
Configuration) line: 1231       
OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, 
Configuration, OrcRawRecordMerger$Options) line: 722  
OrcRawRecordMerger.<init>(Configuration, boolean, Reader, boolean, int, 
ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line: 
1022        
OrcInputFormat.getReader(InputSplit, Options) line: 2108        
OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006        
FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776  
FetchOperator.getRecordReader() line: 344       
FetchOperator.getNextRow() line: 540    
FetchOperator.pushRow() line: 509       
FetchTask.fetch(List) line: 146 
{code} 

Here, for each split we'll make that check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to