Hi,

I have been noticing that for shuffled tasks(groupBy, Join) reducer tasks
are not evenly loaded. Most of them (90%) finished super fast but there are
some outliers that takes much longer as you can see from "Max" value in
following metric. Metric is from Join operation done on two RDDs. I tried
repartitioning both rdd with HashPartitioner before join. It's certainly
faster then before where I was not doing repartitioning. But it still slows
and looks like its not allocating equal number of records to each
partitions. Could this be just result of data skew? Or something else can
be done here?

Summary Metrics for 4000 Completed Tasks
MetricMin25th percentileMedian75th percentileMax
Duration 89 ms 3 s 7 s 14 s 5.9 min

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to