[ 
https://issues.apache.org/jira/browse/TEZ-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368598#comment-15368598
 ] 

Rohini Palaniswamy commented on TEZ-3332:
-----------------------------------------

Below example is on tiny data, so it finished fast. For larger data, 
parallelizing can provide considerable speedup.

{code}
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: 
Sorting & Spilling map output. bufstart = 0, bufend = 4091674, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67104732(268418928), length = 
4129/16777216
2016-07-07 21:39:23,419 [INFO] [TezChild] |compress.CodecPool|: Got brand-new 
compressor [.lzo_deflate]
2016-07-07 21:39:23,452 [INFO] [TezChild] |mapReduceLayer.PigCombiner$Combine|: 
Aliases being processed per job phase (AliasName[line,offset]): null
2016-07-07 21:39:23,860 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: 
Finished spill 0
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: 
Sorting & Spilling map output. bufstart = 0, bufend = 493566, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67102792(268411168), length = 
6069/16777216
2016-07-07 21:39:24,127 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: 
Finished spill 0
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: 
Sorting & Spilling map output. bufstart = 0, bufend = 769, bufvoid = 268435456; 
kvstart=67108860(268435440), kvend = 67108856(268435424), length = 5/16777216
2016-07-07 21:39:24,148 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: 
Finished spill 0
2016-07-07 21:39:24,151 [INFO] [TezChild] |shuffle.ShuffleUtils|: 
EmptyPartition bitsetSize=18, numOutputs=20, emptyPartitions=18, 
compressedSize=11
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: 
Sorting & Spilling map output. bufstart = 0, bufend = 5539516, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67107376(268429504), length = 
1485/16777216
2016-07-07 21:39:24,361 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: 
Finished spill 0
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: 
Sorting & Spilling map output. bufstart = 0, bufend = 12169, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67108736(268434944), length = 
125/16777216
2016-07-07 21:39:24,662 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: 
Finished spill 0
{code}

> Parallelize closing of outputs
> ------------------------------
>
>                 Key: TEZ-3332
>                 URL: https://issues.apache.org/jira/browse/TEZ-3332
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>
> Currently it is serial and when there are multiple outputs it can take time 
> to finish sorting and running the combiner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to