[
https://issues.apache.org/jira/browse/TEZ-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368598#comment-15368598
]
Rohini Palaniswamy commented on TEZ-3332:
-----------------------------------------
Below example is on tiny data, so it finished fast. For larger data,
parallelizing can provide considerable speedup.
{code}
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush
of map output
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525:
Sorting & Spilling map output. bufstart = 0, bufend = 4091674, bufvoid =
268435456; kvstart=67108860(268435440), kvend = 67104732(268418928), length =
4129/16777216
2016-07-07 21:39:23,419 [INFO] [TezChild] |compress.CodecPool|: Got brand-new
compressor [.lzo_deflate]
2016-07-07 21:39:23,452 [INFO] [TezChild] |mapReduceLayer.PigCombiner$Combine|:
Aliases being processed per job phase (AliasName[line,offset]): null
2016-07-07 21:39:23,860 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525:
Finished spill 0
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush
of map output
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554:
Sorting & Spilling map output. bufstart = 0, bufend = 493566, bufvoid =
268435456; kvstart=67108860(268435440), kvend = 67102792(268411168), length =
6069/16777216
2016-07-07 21:39:24,127 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554:
Finished spill 0
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush
of map output
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512:
Sorting & Spilling map output. bufstart = 0, bufend = 769, bufvoid = 268435456;
kvstart=67108860(268435440), kvend = 67108856(268435424), length = 5/16777216
2016-07-07 21:39:24,148 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512:
Finished spill 0
2016-07-07 21:39:24,151 [INFO] [TezChild] |shuffle.ShuffleUtils|:
EmptyPartition bitsetSize=18, numOutputs=20, emptyPartitions=18,
compressedSize=11
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush
of map output
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490:
Sorting & Spilling map output. bufstart = 0, bufend = 5539516, bufvoid =
268435456; kvstart=67108860(268435440), kvend = 67107376(268429504), length =
1485/16777216
2016-07-07 21:39:24,361 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490:
Finished spill 0
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush
of map output
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541:
Sorting & Spilling map output. bufstart = 0, bufend = 12169, bufvoid =
268435456; kvstart=67108860(268435440), kvend = 67108736(268434944), length =
125/16777216
2016-07-07 21:39:24,662 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541:
Finished spill 0
{code}
> Parallelize closing of outputs
> ------------------------------
>
> Key: TEZ-3332
> URL: https://issues.apache.org/jira/browse/TEZ-3332
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
>
> Currently it is serial and when there are multiple outputs it can take time
> to finish sorting and running the combiner
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)