[ 
https://issues.apache.org/jira/browse/PIG-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282540#comment-15282540
 ] 

liyunzhang_intel commented on PIG-4893:
---------------------------------------

I gave a simple case to descibe it:
join.pig
{code}
A = load '/SkewedJoinInput1.txt' as (id,name,n);
B = load '/SkewedJoinInput2.txt' as (id,name);
D = join A by (id,name), B by (id,name);
store D into './testFRJoin.out';
{code}

cat SkewedJoinInput1.txt
{code}
100     apple1  aaa
200     orange1 bbb
300     strawberry      ccc
{code}

cat SkewedJoinInput2.txt
{code}
100     apple1
100     apple2
100     apple2
200     orange1
200     orange2
300     strawberry
400     pear
{code}

This job uses 16s to finish and task deserialization uses 12s.

||Metric||Min||25th percentile||Median|| 75th percentile||Max Duration||
|Duration|0.9 s|0.9 s|0.9 s|0.9 s|0.9 s|
|Scheduler Delay|0.2 s|0.2 s|0.2 s|0.2 s|0.2 s|
|Task Deserialization Time|11 s|        11 s    |11 s   | 11 s| 11 s|
|GC Time |35 ms|35 ms   |35 ms  |35 ms  |35 ms|
|Result Serialization Time |    1 ms     |1 ms| 1 ms |  1 ms     |1 ms|
|Getting Result Time |  0 ms |  0 ms    |0 ms   |0 ms   |0 ms|


> Task deserialization time is too long for spark on yarn mode
> ------------------------------------------------------------
>
>                 Key: PIG-4893
>                 URL: https://issues.apache.org/jira/browse/PIG-4893
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: time.PNG
>
>
> I found the task deserialization time is a bit long when i run any scripts of 
> pigmix in spark on yarn mode.  see the attachment picture.  The duration time 
> is 3s but the task deserialization is 13s.  
> My env is hadoop2.6+spark1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to