[
https://issues.apache.org/jira/browse/HIVE-19588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480196#comment-16480196
]
Prasanth Jayachandran commented on HIVE-19588:
----------------------------------------------
Per [~gopalv] the perf slowness comes from inner loop creation of
VectorizedOrcAcidRowBatchReader for LLAP. Removed it in .2 patch. [~ekoifman]
can you review this diff [https://reviews.apache.org/r/67197/diff/1-2/] ?
> Several invocation of file listing when creating
> VectorizedOrcAcidRowBatchReader
> --------------------------------------------------------------------------------
>
> Key: HIVE-19588
> URL: https://issues.apache.org/jira/browse/HIVE-19588
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 3.1.0
> Reporter: Nita Dembla
> Assignee: Prasanth Jayachandran
> Priority: Major
> Attachments: HIVE-19588.1.patch, HIVE-19588.2.patch, Screen Shot
> 2018-05-16 at 2.23.25 PM.png
>
>
> Looks like we are doing file listing several times when creating one instance
> of VectorizedOrcAcidRowBatchReader
> AcidUtils.parseBaseOrDeltaBucketFilename() does full file listing (when
> there are files with bucket_* prefix) just to get a single file out of a path
> to figure out if it has ACID schema (as part of HIVE-18190).
> There is full file listing where we populate
> 1) ColumnizedDeleteEventRegistry
> 2) SortMergedDeleteEventRegistry
> 3) Twice in computeOffsetAndBucket()
>
> Attaching profiles which [~gopalv] took while debugging.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)