[
https://issues.apache.org/jira/browse/PIG-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4610:
----------------------------------
Attachment: PIG-4610.patch
[~mohitsabharwal],[~kexianda],[~praveenr019],[~xuefuz]:
PIG-4610.patch fixes following unit test failures:
org.apache.pig.builtin.TestOrcStorage.testJoinWithPruning
org.apache.pig.builtin.TestOrcStorage.testLoadStoreMoreDataType
org.apache.pig.builtin.TestOrcStorage.testMultiStore
Let's make an example to explain why it fails before:
testOrcStorage.tmp.pig:
orc-file-11-format.orc is found in
$PIG_HOME/test/org/apache/pig/builtin/orc/orc-file-11-format.orc
{code}
A = load './orc-file-11-format.orc' using OrcStorage();
B = foreach A generate int1,string1;
D = limit B 10;
store D into './testOrcStorage.tmp.out';
{code}
the result of spark:
{code}
false 1
false 1
false 1
false 1
false 1
false 1
false 1
false 1
false 1
false 1
{code}
the result of MR:
{code}
65536 hi
65536 bye
65536 hi
65536 bye
65536 hi
65536 bye
65536 hi
65536 bye
65536 hi
65536 bye
{code}
the data format from orc-file-11-format.orc is like: the requireColumns is the
4th and 9th(this info is stored in orc-file-11-format.orc):
{code}
{true, 100, 2048, 65536, 9223372036854775807, 2.0, -5.0, , bye, {[{1, bye}, {2,
sigh}]}, [{100000000, cat}, {-100000, in}, {1234, hat}], {chani={5, chani},
mauddib={1, mauddib}}, 2000-03-12 15:00:01, 12345678.6547457}
{code}
the difference between spark and mr is because [{{OrcStorage#mRequiredColumns}}
|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/OrcStorage.java#L298]
is not
initialized([{{UDFContext.getUDFContext().isFrontend()}}|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/OrcStorage.java#L296]
is true). The reason {{UDFContext.getUDFContext().isFrontend()}} is true
because
[{{jconf.get(MRConfiguration.JOB_APPLICATION_ATTEMPT_ID)}}|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/util/UDFContext.java#L238]
is null. PIG-4610.patch is set {{MRConfiguration.JOB_APPLICATION_ATTEMPT_ID}}
in SparkUtil#newJobConf.
> Enable "TestOrcStorage“ unit test in spark mode
> -----------------------------------------------
>
> Key: PIG-4610
> URL: https://issues.apache.org/jira/browse/PIG-4610
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4610.patch
>
>
> In https://builds.apache.org/job/Pig-spark/222/#showFailuresLink, it shows
> following unit test failures about "TestOrcStorage":
> org.apache.pig.builtin.TestOrcStorage.testJoinWithPruning
> org.apache.pig.builtin.TestOrcStorage.testLoadStoreMoreDataType
> org.apache.pig.builtin.TestOrcStorage.testMultiStore
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)