[
https://issues.apache.org/jira/browse/HIVE-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062775#comment-15062775
]
Prasanth Jayachandran commented on HIVE-12682:
----------------------------------------------
I see that taskId is already stored in member variable during initialization.
We should use that instead of getting it from conf. There are other places too
that gets taskid from conf object. I will upload a patch shortly to use the
member variable in inner loop.
> Reducers in dynamic partitioning job spend a lot of time running
> hadoop.conf.Configuration.getOverlay
> -----------------------------------------------------------------------------------------------------
>
> Key: HIVE-12682
> URL: https://issues.apache.org/jira/browse/HIVE-12682
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.2.1
> Reporter: Carter Shanklin
> Assignee: Gopal V
> Attachments: reducer.png
>
>
> I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.
> I ran this query:
> {code}
> create table flights (
> …
> )
> PARTITIONED BY (Year int)
> CLUSTERED BY (Month)
> SORTED BY (DayofMonth) into 12 buckets
> STORED AS ORC
> TBLPROPERTIES("orc.bloom.filter.columns"="*")
> ;
> {code}
> (Taken from here:
> https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)
> I profiled just the reduce phase and noticed something odd, the attached
> graph shows where time was spent during the reducer phase.
> !reducer.png!
> Problem seems to relate to
> https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903
> /cc [~gopalv]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)