[
https://issues.apache.org/jira/browse/PIG-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4283:
----------------------------------
Attachment: PIG-4283.patch
[~mohitsabharwal],[~xuefuz],[~praveenr019]:
PIG-4283.patch fixes following four unit test failures:
TestGrunt#testAutoShipUDFContainingJar
TestGrunt#testKeepGoigFailed
TestScriptLanguage#testSysArguments
TestScriptLanguage#runParallelTest2
Changes in PIG-4283.patch are:
1. change shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java
to generate core-site.xml, hdfs-site.xml, mapred-site.xml and yarn=site.xml in
hadoop2 env.
2. fix org.apache.pig.test.TestGrunt#testAutoShipUDFContainingJar
org.apache.pig.test.TestGrunt#testKeepGoigFailed
*why TestGrunt#testAutoShipUDFContainingJar failed?*
The reason why TestGrunt#testAutoShipUDFContainingJar failed in previous code
is because "it can not find table_testAutoShipUDFContainingJar".
table_testAutoShipUDFContainingJar is located in
hdfs://xxxx:/user/root/table_testAutoShipUDFContainingJar. It finds
table_testAutoShipUDFContainingJar in local file system not hadoop file system
because the hadoop env is not correct(the value of FS_DEFAULT_NAME_KEY in
core-site.xml is "file:///" not "hdfs://xxxx:8020".
Even we fix the hadoop2 env problem in
shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java,
TestGrunt#testAutoShipUDFContainingJar fails in
[{{assertTrue(found);}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1506]
That's because jar loaded info is not shown in the
[{{pri.stderrContent}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1495].
Then I add SparkLauncher#addJarToSparkJobWorkingDirectory and let "ADDED JAR
xxxx" info appear in the {{pri.stderrContent}}.
*why TestGrunt#testKeepGoigFailed fails?*
following script should fail because "B = stream A through `false`;"( "file
false not exists".)
{code}
String strCmd =
"rmf bar;"
+"rmf foo;"
+"rmf baz;"
+"A = load 'passwd';"
+"B = foreach A generate 1;"
+"C = foreach A generate 0/0;"
+"store B into 'foo';"
+"store C into 'bar';"
+"A = load 'passwd';"
+"B = stream A through `false`;"
+"store B into 'baz';"
+"cat baz;";
{code}
stream grammer:
{code}
alias = STREAM alias [, alias …] THROUGH {'command' | cmd_alias } [AS schema] ;
{code}
when script fails,
[exception|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L936]
'baz does not exist' should be thrown out because in mr when the job fails, it
will automatically delete the output directory "baz" and
[{{caught}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L931]
is true.
in spark, when job fails or not, it will not automatically delete the output
directory(see [SPARK_5836|https://issues.apache.org/jira/browse/SPARK-5836] and
{{caught}} is false. So in spark mode,the judgement of {{caught}} is true is
skipped to make the unit test pass.
> Enable unit test "TestGrunt" for spark
> --------------------------------------
>
> Key: PIG-4283
> URL: https://issues.apache.org/jira/browse/PIG-4283
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4283.patch, TEST-org.apache.pig.test.TestGrunt.txt
>
>
> error log is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)