[
https://issues.apache.org/jira/browse/PIG-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Giovanni Botta updated PIG-3987:
--------------------------------
Description:
I ran into a very strange issue with one of my pig scripts. I described it in
this SO:
http://stackoverflow.com/questions/24047572/strange-cast-error-in-pig-hadoop
Here it is:
I have the following script:
{code}
br = LOAD 'cfs:///somedata';
SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE;
SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE;
SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE;
SPLIT not_s2 INTO s3 IF (s > 0L), s4 OTHERWISE;
tmp0 = FOREACH s0 GENERATE b, 'x' as seg;
tmp1 = FOREACH s1 GENERATE b, 'y' as seg;
tmp2 = FOREACH s2 GENERATE b, 'z' as seg;
tmp3 = FOREACH s3 GENERATE b, 'w' as seg;
tmp4 = FOREACH s4 GENERATE b, 't' as seg;
out = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;
dump out;
{code}
Where the file loaded in `br` was generated by a previous Pig script and has an
embedded schema (a .pig_schema file):
{code}
describe br
br: {b: chararray,p: long,afternoon: long,ddv: long,pa: long,s: long,t0002:
long,t0204: long,t0406: long,t0608: long,t0810: long,t1012: long,t1214:
long,t1416: long,t1618: long,t1820: long,t2022: long,t2200:
long,browser_software: chararray,first_timestamp: long,last_timestamp: long,os:
chararray,platform: chararray,sp: int,adp: double}
{code}
Some irrelevant fields were edited from the above (I can't fully disclose the
nature of the data at this time).
The script fails with the following error:
{code}
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Integer
cannot be cast to java.lang.Long
{code}
However, dumping `s0`, `s1`, `s2`, `s3`, `s4` or `tmp0`, `tmp1`, `tmp2` `tmp3`,
`tmp4` works flawlessly.
The Hadoop job tracker shows the following error 4 times:
{code}
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
{code}
I also tried this snippet (instead of the original `dump`):
{code}
x = UNION s1,s2;
y = FOREACH x GENERATE b;
dump y;
{code}
and I get a different (but I assume related) error:
{code}
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Double
cannot be cast to java.lang.Long
{code}
with the job tracker error (repeated 4 times):
{code}
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.doComparison(GTOrEqualToExpr.java:111)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.getNext(GTOrEqualToExpr.java:78)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:141)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
{code}
I don't think I have a data quality issue. I successfully ran the following
snippet (taking it from after the definition of s0, s1, ...):
{code}
tmp0 = FOREACH s0 GENERATE *, 'x' as seg;
tmp1 = FOREACH s1 GENERATE *, 'y' as seg;
tmp2 = FOREACH s2 GENERATE *, 'z' as seg;
tmp3 = FOREACH s3 GENERATE *, 'w' as seg;
tmp4 = FOREACH s4 GENERATE *, 't' as seg;
br_seg = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;
breakdown = FOREACH( GROUP br_seg BY seg ){
ddb = FILTER br_seg BY (ddv > 0L);
desktop = FILTER br_seg BY (platform == 'd');
mobile = FILTER br_seg BY (platform == 'm');
p_br = FILTER br_seg BY (sp == 1);
tablet = FILTER br_seg BY (platform == 't');
GENERATE group as seg,
COUNT(br_seg) as br,
SUM(br_seg.p) as p,
COUNT(ddb) as ddb,
COUNT(desktop) as desktop,
COUNT(mobile) as mobile,
COUNT(p_br) as p_br,
COUNT(tablet) as tablet,
SUM(br_seg.ddv) as ddv,
SUM(br_seg.pa) as pa,
SUM(br_seg.t0002) as t0002,
SUM(br_seg.t0204) as t0204,
SUM(br_seg.t0406) as t0406,
SUM(br_seg.t0608) as t0608,
SUM(br_seg.t0810) as t0810,
SUM(br_seg.t1012) as t1012,
SUM(br_seg.t1214) as t1214,
SUM(br_seg.t1416) as t1416,
SUM(br_seg.t1618) as t1618,
SUM(br_seg.t1820) as t1820,
SUM(br_seg.t2022) as t2022,
SUM(br_seg.t2200) as t2200;
}
dump breakdown
{code}
Is this a known bug or a new one? Is there a work around?
was:
I ran into a very strange issue with one of my pig scripts. I described it in
this SO:
http://stackoverflow.com/questions/24047572/strange-cast-error-in-pig-hadoop
Here it is:
I have the following script:
{code}
br = LOAD 'cfs:///somedata';
SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE;
SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE;
SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE;
SPLIT not_s2 INTO s3 IF (s > 0L), s4 OTHERWISE;
tmp0 = FOREACH s0 GENERATE b, 'x' as seg;
tmp1 = FOREACH s1 GENERATE b, 'y' as seg;
tmp2 = FOREACH s2 GENERATE b, 'z' as seg;
tmp3 = FOREACH s3 GENERATE b, 'w' as seg;
tmp4 = FOREACH s4 GENERATE b, 't' as seg;
out = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;
dump out;
{code}
Where the file loaded in `br` was generated by a previous Pig script and has an
embedded schema (a .pig_schema file):
{code}
describe br
br: {b: chararray,p: long,afternoon: long,ddv: long,pa: long,s: long,t0002:
long,t0204: long,t0406: long,t0608: long,t0810: long,t1012: long,t1214:
long,t1416: long,t1618: long,t1820: long,t2022: long,t2200:
long,browser_software: chararray,first_timestamp: long,last_timestamp: long,os:
chararray,platform: chararray,sp: int,adp: double}
{code}
Some irrelevant fields were edited from the above (I can't fully disclose the
nature of the data at this time).
The script fails with the following error:
{code}
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Integer
cannot be cast to java.lang.Long
{code}
However, dumping `s0`, `s1`, `s2`, `s3`, `s4` or `tmp0`, `tmp1`, `tmp2` `tmp3`,
`tmp4` works flawlessly.
The Hadoop job tracker shows the following error 4 times:
{code}
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
{code}
I also tried this snippet (instead of the original `dump`):
{code}
x = UNION s1,s2;
y = FOREACH x GENERATE b;
dump y;
{code}
and I get a different (but I assume related) error:
{code}
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Double
cannot be cast to java.lang.Long
{code}
with the job tracker error (repeated 4 times):
{code}
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.doComparison(GTOrEqualToExpr.java:111)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.getNext(GTOrEqualToExpr.java:78)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:141)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
{code}
I don't think I have a data quality issue. I successfully ran the following
snippet (taking it from after the definition of s0, s1, ...):
{code}
tmp0 = FOREACH s0 GENERATE *, 'x' as seg;
tmp1 = FOREACH s1 GENERATE *, 'y' as seg;
tmp2 = FOREACH s2 GENERATE *, 'z' as seg;
tmp3 = FOREACH s3 GENERATE *, 'w' as seg;
tmp4 = FOREACH s4 GENERATE *, 't' as seg;
br_seg = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;
breakdown = FOREACH( GROUP br_seg BY seg ){
ddb = FILTER br_seg BY (ddv > 0L);
desktop = FILTER br_seg BY (platform == 'd');
mobile = FILTER br_seg BY (platform == 'm');
p_br = FILTER br_seg BY (sp == 1);
tablet = FILTER br_seg BY (platform == 't');
GENERATE group as seg,
COUNT(br_seg) as br,
SUM(br_seg.p) as p,
COUNT(ddb) as ddb,
COUNT(desktop) as desktop,
COUNT(mobile) as mobile,
COUNT(p_br) as p_br,
COUNT(tablet) as tablet,
SUM(br_seg.ddv) as ddv,
SUM(br_seg.pa) as pa,
SUM(br_seg.t0002) as t0002,
SUM(br_seg.t0204) as t0204,
SUM(br_seg.t0406) as t0406,
SUM(br_seg.t0608) as t0608,
SUM(br_seg.t0810) as t0810,
SUM(br_seg.t1012) as t1012,
SUM(br_seg.t1214) as t1214,
SUM(br_seg.t1416) as t1416,
SUM(br_seg.t1618) as t1618,
SUM(br_seg.t1820) as t1820,
SUM(br_seg.t2022) as t2022,
SUM(br_seg.t2200) as t2200;
}
dump audience_breakdown_by_segment
{code}
Is this a known bug or a new one? Is there a work around?
> Strange cast error with UNION
> -----------------------------
>
> Key: PIG-3987
> URL: https://issues.apache.org/jira/browse/PIG-3987
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.10.1
> Reporter: Giovanni Botta
> Attachments: .pig_header, .pig_schema, part-r-00000
>
>
> I ran into a very strange issue with one of my pig scripts. I described it in
> this SO:
> http://stackoverflow.com/questions/24047572/strange-cast-error-in-pig-hadoop
> Here it is:
> I have the following script:
> {code}
> br = LOAD 'cfs:///somedata';
> SPLIT br INTO s0 IF (sp == 1), not_s0 OTHERWISE;
> SPLIT not_s0 INTO s1 IF (adp >= 1.0), not_s1 OTHERWISE;
> SPLIT not_s1 INTO s2 IF (p > 1L), not_s2 OTHERWISE;
> SPLIT not_s2 INTO s3 IF (s > 0L), s4 OTHERWISE;
> tmp0 = FOREACH s0 GENERATE b, 'x' as seg;
> tmp1 = FOREACH s1 GENERATE b, 'y' as seg;
> tmp2 = FOREACH s2 GENERATE b, 'z' as seg;
> tmp3 = FOREACH s3 GENERATE b, 'w' as seg;
> tmp4 = FOREACH s4 GENERATE b, 't' as seg;
> out = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;
> dump out;
> {code}
> Where the file loaded in `br` was generated by a previous Pig script and has
> an embedded schema (a .pig_schema file):
> {code}
> describe br
> br: {b: chararray,p: long,afternoon: long,ddv: long,pa: long,s: long,t0002:
> long,t0204: long,t0406: long,t0608: long,t0810: long,t1012: long,t1214:
> long,t1416: long,t1618: long,t1820: long,t2022: long,t2200:
> long,browser_software: chararray,first_timestamp: long,last_timestamp:
> long,os: chararray,platform: chararray,sp: int,adp: double}
> {code}
> Some irrelevant fields were edited from the above (I can't fully disclose the
> nature of the data at this time).
> The script fails with the following error:
> {code}
> ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Integer
> cannot be cast to java.lang.Long
> {code}
> However, dumping `s0`, `s1`, `s2`, `s3`, `s4` or `tmp0`, `tmp1`, `tmp2`
> `tmp3`, `tmp4` works flawlessly.
> The Hadoop job tracker shows the following error 4 times:
> {code}
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> java.lang.Long
> at java.lang.Long.compareTo(Long.java:50)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.doComparison(EqualToExpr.java:116)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr.getNext(EqualToExpr.java:83)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:260)
> {code}
> I also tried this snippet (instead of the original `dump`):
> {code}
> x = UNION s1,s2;
> y = FOREACH x GENERATE b;
> dump y;
> {code}
> and I get a different (but I assume related) error:
> {code}
> ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.lang.Double
> cannot be cast to java.lang.Long
> {code}
> with the job tracker error (repeated 4 times):
> {code}
> java.lang.ClassCastException: java.lang.Double cannot be cast to
> java.lang.Long
> at java.lang.Long.compareTo(Long.java:50)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.doComparison(GTOrEqualToExpr.java:111)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr.getNext(GTOrEqualToExpr.java:78)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:141)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:260)
> {code}
> I don't think I have a data quality issue. I successfully ran the following
> snippet (taking it from after the definition of s0, s1, ...):
> {code}
> tmp0 = FOREACH s0 GENERATE *, 'x' as seg;
> tmp1 = FOREACH s1 GENERATE *, 'y' as seg;
> tmp2 = FOREACH s2 GENERATE *, 'z' as seg;
> tmp3 = FOREACH s3 GENERATE *, 'w' as seg;
> tmp4 = FOREACH s4 GENERATE *, 't' as seg;
> br_seg = UNION ONSCHEMA tmp0, tmp1, tmp2, tmp3, tmp4;
> breakdown = FOREACH( GROUP br_seg BY seg ){
> ddb = FILTER br_seg BY (ddv > 0L);
> desktop = FILTER br_seg BY (platform == 'd');
> mobile = FILTER br_seg BY (platform == 'm');
> p_br = FILTER br_seg BY (sp == 1);
> tablet = FILTER br_seg BY (platform == 't');
> GENERATE group as seg,
> COUNT(br_seg) as br,
> SUM(br_seg.p) as p,
> COUNT(ddb) as ddb,
> COUNT(desktop) as desktop,
> COUNT(mobile) as mobile,
> COUNT(p_br) as p_br,
> COUNT(tablet) as tablet,
> SUM(br_seg.ddv) as ddv,
> SUM(br_seg.pa) as pa,
> SUM(br_seg.t0002) as t0002,
> SUM(br_seg.t0204) as t0204,
> SUM(br_seg.t0406) as t0406,
> SUM(br_seg.t0608) as t0608,
> SUM(br_seg.t0810) as t0810,
> SUM(br_seg.t1012) as t1012,
> SUM(br_seg.t1214) as t1214,
> SUM(br_seg.t1416) as t1416,
> SUM(br_seg.t1618) as t1618,
> SUM(br_seg.t1820) as t1820,
> SUM(br_seg.t2022) as t2022,
> SUM(br_seg.t2200) as t2200;
> }
> dump breakdown
> {code}
> Is this a known bug or a new one? Is there a work around?
--
This message was sent by Atlassian JIRA
(v6.2#6252)