[
https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037675#comment-17037675
]
Syed Shameerur Rahman edited comment on HIVE-22891 at 2/17/20 6:26 AM:
-----------------------------------------------------------------------
[~sershe] [~ashutoshc] [~prasanth_j] [~gopalv] Please review the patch.
was (Author: srahman):
[~ashutoshc] [~prasanth_j] [~gopalv] Please review the patch.
> Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
> -----------------------------------------------------------------------------
>
> Key: HIVE-22891
> URL: https://issues.apache.org/jira/browse/HIVE-22891
> Project: Hive
> Issue Type: Task
> Reporter: Syed Shameerur Rahman
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22891.01.patch
>
>
> {code:java}
> try {
> // TODO: refactor this out
> if (pathToPartInfo == null) {
> MapWork mrwork;
> if (HiveConf.getVar(conf,
> HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) {
> mrwork = (MapWork) Utilities.getMergeWork(jobConf);
> if (mrwork == null) {
> mrwork = Utilities.getMapWork(jobConf);
> }
> } else {
> mrwork = Utilities.getMapWork(jobConf);
> }
> pathToPartInfo = mrwork.getPathToPartitionInfo();
> } PartitionDesc part = extractSinglePartSpec(hsplit);
> inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part);
> } catch (HiveException e) {
> throw new IOException(e);
> }
> {code}
> The above piece of code in CombineHiveRecordReader.java was introduced in
> HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is
> not required in non-LLAP mode of execution as the method
> HiveInputFormat.wrapForLlap() simply returns the previously defined
> inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec()
> has some serious performance implications. If there are large no. of small
> files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3)
> seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster
> than the query run on latest hive.
> {code:java}
> 2020-02-11 07:15:04,701 INFO [main]
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from
> 2020-02-11 07:15:06,468 WARN [main]
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions
> found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03,
> hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}}
> 2020-02-11 07:15:06,468 INFO [main]
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting
> org.apache.hadoop.mapred.FileSplit{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)