[
https://issues.apache.org/jira/browse/HIVE-21458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prasanth Jayachandran reassigned HIVE-21458:
--------------------------------------------
Assignee: (was: Prasanth Jayachandran)
> ACID: Optimize AcidUtils$MetaDataFile.isRawFormat
> --------------------------------------------------
>
> Key: HIVE-21458
> URL: https://issues.apache.org/jira/browse/HIVE-21458
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 3.1.1
> Reporter: Vaibhav Gumashta
> Priority: Major
> Attachments: async-prof-pid-1-cpu-1.svg
>
>
> In the transactional subsystems, in several places we check to see if a data
> file has ROW__ID fields or not. Every time we do that (even within the
> context of the same query), we open a Reader for that file/split. We could
> optimize this by caching or perhaps checking once, and saving our result for
> later. Also, perhaps we don't need to do this for every split. An example
> call stack:
> {code}
> OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105
> AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026
> AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022
> AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007
> OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path,
> Path, Configuration) line: 1231
> OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options,
> Configuration, OrcRawRecordMerger$Options) line: 722
> OrcRawRecordMerger.<init>(Configuration, boolean, Reader, boolean, int,
> ValidWriteIdList, Reader$Options, Path[], OrcRawRecordMerger$Options) line:
> 1022
> OrcInputFormat.getReader(InputSplit, Options) line: 2108
> OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006
> FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776
> FetchOperator.getRecordReader() line: 344
> FetchOperator.getNextRow() line: 540
> FetchOperator.pushRow() line: 509
> FetchTask.fetch(List) line: 146
> {code}
> Here, for each split we'll make that check.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)