[ 
https://issues.apache.org/jira/browse/PIG-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4283:
----------------------------------
    Attachment: PIG-4283.patch

[~mohitsabharwal],[~xuefuz],[~praveenr019]:
PIG-4283.patch fixes following four unit test failures:
TestGrunt#testAutoShipUDFContainingJar
TestGrunt#testKeepGoigFailed
TestScriptLanguage#testSysArguments
TestScriptLanguage#runParallelTest2

Changes in PIG-4283.patch are:
        1. change shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java 
to generate core-site.xml, hdfs-site.xml, mapred-site.xml and yarn=site.xml in 
hadoop2 env. 
        2. fix org.apache.pig.test.TestGrunt#testAutoShipUDFContainingJar       
org.apache.pig.test.TestGrunt#testKeepGoigFailed
        
*why TestGrunt#testAutoShipUDFContainingJar failed?*
The reason why TestGrunt#testAutoShipUDFContainingJar failed in previous code 
is because "it can not find table_testAutoShipUDFContainingJar". 
table_testAutoShipUDFContainingJar is located in 
hdfs://xxxx:/user/root/table_testAutoShipUDFContainingJar.   It finds 
table_testAutoShipUDFContainingJar in local file system not hadoop file system 
because the hadoop env is not correct(the value of FS_DEFAULT_NAME_KEY in 
core-site.xml is "file:///" not "hdfs://xxxx:8020". 
        
Even we fix the hadoop2 env problem in 
shims/test/hadoop23/org/apache/pig/test/SparkMiniCluster.java, 
TestGrunt#testAutoShipUDFContainingJar  fails in  
[{{assertTrue(found);}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1506]
        
That's because jar loaded info is not shown in the 
[{{pri.stderrContent}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L1495].
  Then I add SparkLauncher#addJarToSparkJobWorkingDirectory  and let "ADDED JAR 
xxxx" info appear in the {{pri.stderrContent}}.

*why TestGrunt#testKeepGoigFailed fails?*
following script should fail because "B = stream A through `false`;"( "file 
false not exists".)
        {code}
          String strCmd =
                    "rmf bar;"
                    +"rmf foo;"
                    +"rmf baz;"
                    +"A = load 'passwd';"
                    +"B = foreach A generate 1;"
                    +"C = foreach A generate 0/0;"
                    +"store B into 'foo';"
                    +"store C into 'bar';"
                    +"A = load 'passwd';"
                    +"B = stream A through `false`;"
                    +"store B into 'baz';"
                    +"cat baz;";
        {code}

stream grammer:
{code}
alias = STREAM alias [, alias …] THROUGH {'command' | cmd_alias } [AS schema] ;
{code}

when script fails, 
[exception|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L936]
 'baz does not exist' should be thrown out because in mr when the job fails, it 
will automatically delete the output directory "baz" and 
[{{caught}}|https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/TestGrunt.java#L931]
 is true.

in spark, when job fails or not, it will not automatically delete the output 
directory(see [SPARK_5836|https://issues.apache.org/jira/browse/SPARK-5836] and 
{{caught}} is false. So in spark mode,the judgement of {{caught}} is true is 
skipped to make the unit test pass.


> Enable unit test "TestGrunt" for spark
> --------------------------------------
>
>                 Key: PIG-4283
>                 URL: https://issues.apache.org/jira/browse/PIG-4283
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4283.patch, TEST-org.apache.pig.test.TestGrunt.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to