[
https://issues.apache.org/jira/browse/TEZ-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146195#comment-16146195
]
Jonathan Eagles commented on TEZ-3824:
--------------------------------------
There is some redundant work when spilling to disk. TEZ-444 (0.2 line) replaced
the JobConf with Configuration, for new combiner api, we can cache a version of
the JobConf to avoid the redundant creations.
> MRCombiner creates new JobConf copy per spill
> ---------------------------------------------
>
> Key: TEZ-3824
> URL: https://issues.apache.org/jira/browse/TEZ-3824
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
>
> {noformat:title=scope-57(HASH_JOIN) stack trace}
> "SpillThread {scope_60_" #99 daemon prio=5 os_prio=0 tid=0x00007f2128d21800
> nid=0x7487 runnable [0x00007f21154c4000]
> java.lang.Thread.State: RUNNABLE
> at
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1012)
> at
> java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:1084)
> at
> java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:852)
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:728)
> - locked <0x00000000d1dc5240> (a org.apache.hadoop.conf.Configuration)
> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:442)
> at
> org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:67)
> at
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.<init>(TaskAttemptContextImpl.java:49)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.<init>(TaskInputOutputContextImpl.java:54)
> at
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.<init>(ReduceContextImpl.java:95)
> at
> org.apache.tez.mapreduce.combine.MRCombiner.createReduceContext(MRCombiner.java:237)
> at
> org.apache.tez.mapreduce.combine.MRCombiner.runNewCombiner(MRCombiner.java:181)
> at
> org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:115)
> at
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.runCombineProcessor(ExternalSorter.java:313)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:937)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:861)
> at
> org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:780)
> {noformat}
> {code:title=JobConf copy construction for tez}
> public JobContextImpl(Configuration conf, JobID jobId) {
> if (conf instanceof JobConf) {
> this.conf = (JobConf)conf;
> } else {
> --->this.conf = new JobConf(conf);<----
> }
> this.jobId = jobId;
> this.credentials = this.conf.getCredentials();
> try {
> this.ugi = UserGroupInformation.getCurrentUser();
> } catch (IOException e) {
> throw new RuntimeException(e);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)