[
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645119#comment-15645119
]
Adam Szita commented on PIG-5052:
---------------------------------
You can try the following:
{code}
./pig -x spark_local
A = LOAD '../test/org/apache/pig/test/data/passwd' using PigStorage();
dump A
dump A
{code}
The second dump will hang for me. The reason is that jobs 0 and 1 are returned
(because of using the same job group id) in JobGraphBuilder#225:
{code}
sparkContext.statusTracker().getJobIdsForGroup(jobGroupID)
{code}
..but JobMetricsListener will only have job 1 here in finishedJobIds:
{code}
public synchronized boolean waitForJobToEnd(int jobId) throws
InterruptedException {
if (finishedJobIds.contains(jobId)) {
finishedJobIds.remove(jobId);
return true;
}
wait();
return false;
}
{code}
so we will never see job 0 after the second dump, but yet expect to.
On top of this I think it's a clearer approach to use different job group IDs
for different jobs.
> Initialize MRConfiguration.JOB_ID in spark mode correctly
> ---------------------------------------------------------
>
> Key: PIG-5052
> URL: https://issues.apache.org/jira/browse/PIG-5052
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5052.2.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.
> we just set the value as a random string.
> {code}
> jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)