[
https://issues.apache.org/jira/browse/PIG-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4614:
----------------------------------
Attachment: PIG-4614.patch
[~mohitsabharwal],[~praveenr019],[~kexianda],[~xuefuz]:
PIG-4614.patch fixes following unit test failures:
org.apache.pig.newplan.logical.relational.TestLocationInPhysicalPlan.test
In PIG-4614.patch: changes are:
1. add org.apache.pig.tools.pigstats.spark.SparkScriptState:SparkScriptState
encapsulates settings for a Pig script that runs on a hadoop cluster.
These settings are added to all Spark jobs spawned by the script and in turn
are persisted in the hadoop job xml.
2.The alias , aliasLocations, featureSet are stored in
SparkScriptState.SparkScriptInfo.DAGAliasVisitor#aliases,SparkScriptState.SparkScriptInfo.DAGAliasVisitor#aliasLocations,SparkScriptState.SparkScriptInfo.DAGAliasVisitor#featureSet.
Let's use an example to explain alias, aliasLocations, featureSet:
{code}
A = LOAD '/tmp/pig_junit_tmp855375838/test9183536755838741852input' using
PigStorage();
B = GROUP A BY $0;
A = FOREACH B GENERATE COUNT(A);
STORE A INTO '/tmp/pig_junit_tmp855375838/test3028903370752266437output';
{code}
{code}
Alias:
A,B
Alias_Location:
A[1,4],B[2,4],A[3,4]
FeatureSet:GROUP_BY
{code}
3.How to set alias , aliasLocations, featureSet?
The stacktrace:
{code}
org.apache.pig.tools.pigstats.spark.SparkPigStats#initialize
org.apache.pig.tools.pigstats.spark.SparkScriptState#setScriptInfo
org.apache.pig.tools.pigstats.spark.SparkScriptState.SparkScriptInfo#initialize
org.apache.pig.tools.pigstats.spark.SparkScriptState.SparkScriptInfo.DAGAliasVisitor#visit
{code}
The alias, aliasLocations, featureSet are stored in
org.apache.pig.tools.pigstats.spark.SparkScriptState#getScriptInfo, and these
info are set in
org.apache.pig.tools.pigstats.spark.SparkJobStats#setAlias in following
stacktrace:
{code}
org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher#launchPig
org.apache.pig.tools.pigstats.spark.SparkPigStats#finish
org.apache.pig.tools.pigstats.spark.SparkPigStats#display
org.apache.pig.tools.pigstats.spark.SparkJobStats#setAlias
{code}
> Enable "TestLocationInPhysicalPlan" in spark mode
> -------------------------------------------------
>
> Key: PIG-4614
> URL: https://issues.apache.org/jira/browse/PIG-4614
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4614.patch
>
>
> in https://builds.apache.org/job/Pig-spark/228/#showFailuresLink, it shows
> following unit test fails:
> org.apache.pig.newplan.logical.relational.TestLocationInPhysicalPlan.test
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)