[
https://issues.apache.org/jira/browse/PIG-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Charles updated PIG-2368:
--------------------------------
Description:
When executed with Penny, a given script without schema specified doesn't
convert the values to proper type and therefore the operators fed by the agent
output incorrectly compute the result.
For reference:
copy a file with the content below to /home/hadoop/tablea.
0;4;2
1;3;3
2;2;0
3;1;4
4;0;1
script.pig content:
a = load '/home/hadoop/tablea' using PigStorage(';');
b = filter a by $2 < 1000;
store b into '/home/hadoop/tablea.out';
PENNY:
Command Line:
java -cp
/var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar
org.apache.pig.penny.apps.ri.Main script.pig b 2
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
Output(s):
Successfully stored 0 records in: "/home/hadoop/tablea.out"
Counters:
Total records written : 0 [OUTPUT FILE IS EMPTY]
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
----------------------
Using the same environment and running pig without penny
Command Line:
pig script.pig
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
Output(s):
Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"
Counters:
Total records written : 5
Total bytes written : 30
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
================
It happens because when the plan is rebuilt in penny, it fails to proper add
the type of the column which is recognized as bytearray instead of integer.
similar issue can be seen for the -nop- application too.
was:
When executed with penny a script that has no schema especified doesnt convert
the values to proper type and therefore mess up with he result.
For reference:
copy a file with the content below to /home/hadoop/tablea.
0;4;2
1;3;3
2;2;0
3;1;4
4;0;1
script.pig content:
a = load '/home/hadoop/tablea' using PigStorage(';');
b = filter a by $2 < 1000;
store b into '/home/hadoop/tablea.out';
PENNY:
Command Line:
java -cp
/var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar
org.apache.pig.penny.apps.ri.Main script.pig b 2
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
Output(s):
Successfully stored 0 records in: "/home/hadoop/tablea.out"
Counters:
Total records written : 0 [OUTPUT FILE IS EMPTY]
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
----------------------
Using the same environment and running pig without penny
Command Line:
pig script.pig
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
Output(s):
Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"
Counters:
Total records written : 5
Total bytes written : 30
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
================
It happens because when the plan is rebuilt in penny, it fails to proper add
the type of the column which is recognized as bytearray instead of integer.
similar issue can be seen for the -nop- application too.
> Penny doesn't recognize the schema properly
> -------------------------------------------
>
> Key: PIG-2368
> URL: https://issues.apache.org/jira/browse/PIG-2368
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.1
> Environment: Testing in a cluster with 6 nodes
> Debian linux 64bit,
> java 1.6,
> Hadoop 0.20.2
> Reporter: Daniel Charles
> Priority: Minor
>
> When executed with Penny, a given script without schema specified doesn't
> convert the values to proper type and therefore the operators fed by the
> agent output incorrectly compute the result.
> For reference:
> copy a file with the content below to /home/hadoop/tablea.
> 0;4;2
> 1;3;3
> 2;2;0
> 3;1;4
> 4;0;1
> script.pig content:
> a = load '/home/hadoop/tablea' using PigStorage(';');
> b = filter a by $2 < 1000;
> store b into '/home/hadoop/tablea.out';
> PENNY:
> Command Line:
> java -cp
> /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar
> org.apache.pig.penny.apps.ri.Main script.pig b 2
> Output summary:
> Input(s):
> Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
> Output(s):
> Successfully stored 0 records in: "/home/hadoop/tablea.out"
> Counters:
> Total records written : 0 [OUTPUT FILE IS EMPTY]
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> 11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
> ----------------------
> Using the same environment and running pig without penny
> Command Line:
> pig script.pig
> Output summary:
> Input(s):
> Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
> Output(s):
> Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"
> Counters:
> Total records written : 5
> Total bytes written : 30
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> ================
> It happens because when the plan is rebuilt in penny, it fails to proper add
> the type of the column which is recognized as bytearray instead of integer.
> similar issue can be seen for the -nop- application too.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira