[
https://issues.apache.org/jira/browse/PIG-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4175:
------------------------------------
Attachment: PIG-4175-Debug.patch
[~daijy],
I was trying to enhance the test to compare actual results so that test is
more foolproof. But found that the output of cross was all bytearray even
though dump of the schema is as expected.
C: {long}
D: {A::a0: int,A::a1: chararray,long}
Am I missing something? Attaching the debug patch.
> PIG CROSS operation follow by STORE produces non-deterministic results each
> run
> -------------------------------------------------------------------------------
>
> Key: PIG-4175
> URL: https://issues.apache.org/jira/browse/PIG-4175
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11, 0.12.0
> Environment: RHEL 6/64-bit
> Reporter: Jim Huang
> Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4175-1.patch, PIG-4175-Debug.patch, mktestdata.py,
> pig_testcross_plan.png, test_cross.out, test_cross.pig
>
>
> Three files will be attached to help visualize this issue.
> 1. mktestdata.py - to generate test data to feed the pig script
> 2. test_cross.pig - the PIG script using CROSS and STORE
> 3. test_cross.out - the PIG console output showing the input/output records
> delta
> To reproduce this PIG CROSS operation problem, you need to use the supplied
> Python script,
> mktestdata.py, to generate an input file that is at least 13,948,228,930
> bytes (> 13GB).
> The CROSS between raw_data (m records) and cross_count (1 record) should
> yield exactly (m records) as the output.
> The STORE results from the CROSS operations yielded about 1/3 of input record
> in raw_data as the output.
> If I joined the both of the CROSS operations together, the STORE results from
> the CROSS operations yielded about 2/3
> of the input records in raw-data as the output.
> -- data = CROSS raw_data, field04s_count, subsection1_field04s_count,
> subsection2_field04s_count;
> We have reproduced this using both Pig 0.11 (Hadoop 1.x) and Pig 0.12 (Hadoop
> 2.x) clusters.
> The default HDFS block size is 128MB.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)