[
https://issues.apache.org/jira/browse/PIG-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282540#comment-15282540
]
liyunzhang_intel commented on PIG-4893:
---------------------------------------
I gave a simple case to descibe it:
join.pig
{code}
A = load '/SkewedJoinInput1.txt' as (id,name,n);
B = load '/SkewedJoinInput2.txt' as (id,name);
D = join A by (id,name), B by (id,name);
store D into './testFRJoin.out';
{code}
cat SkewedJoinInput1.txt
{code}
100 apple1 aaa
200 orange1 bbb
300 strawberry ccc
{code}
cat SkewedJoinInput2.txt
{code}
100 apple1
100 apple2
100 apple2
200 orange1
200 orange2
300 strawberry
400 pear
{code}
This job uses 16s to finish and task deserialization uses 12s.
||Metric||Min||25th percentile||Median|| 75th percentile||Max Duration||
|Duration|0.9 s|0.9 s|0.9 s|0.9 s|0.9 s|
|Scheduler Delay|0.2 s|0.2 s|0.2 s|0.2 s|0.2 s|
|Task Deserialization Time|11 s| 11 s |11 s | 11 s| 11 s|
|GC Time |35 ms|35 ms |35 ms |35 ms |35 ms|
|Result Serialization Time | 1 ms |1 ms| 1 ms | 1 ms |1 ms|
|Getting Result Time | 0 ms | 0 ms |0 ms |0 ms |0 ms|
> Task deserialization time is too long for spark on yarn mode
> ------------------------------------------------------------
>
> Key: PIG-4893
> URL: https://issues.apache.org/jira/browse/PIG-4893
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: time.PNG
>
>
> I found the task deserialization time is a bit long when i run any scripts of
> pigmix in spark on yarn mode. see the attachment picture. The duration time
> is 3s but the task deserialization is 13s.
> My env is hadoop2.6+spark1.6.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)