[
https://issues.apache.org/jira/browse/PIG-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035697#comment-15035697
]
Rohini Palaniswamy edited comment on PIG-3357 at 12/2/15 12:16 PM:
-------------------------------------------------------------------
Found this issue recently reported by one of our users.
+1 for documenting in Pig manual. But we need not do bytearray->float
typecasting. Defining schema as double works and is a simpler alternative. We
should mandate that the user define the outputSchema as double whenever there
is a Python float.
{code}
@outputSchema("v: double")
def test(v):
return v
{code}
Our user ran into issue when they tried to do v > 0 ? 0 : 1 and res.result =
Integer.valueOf(((Float) res.result).intValue()); in POCast threw
java.lang.ClassCastException: java.lang.Double cannot be cast to
java.lang.Float. For those cases fixing POCast to typecast to Number instead
of Long,Float,Double,BigDecimal,BigInteger should avoid those errors even if
user wrongly declares the schema as float instead of double. Since
http://pig.apache.org/docs/r0.15.0/basic.html#cast supports type casting
between all these types it should be fine.
http://pig.apache.org/docs/r0.15.0/basic.html#cast does not contain information
for BigDecimal and BigInteger types and need to be updated.
For cases like the issue of storing reported with AvroStorage reported
originally in this bug, we can either
- leave it to user to define outputSchema as double as typecasting is done
in avro libraries and nothing can be done in Pig's AvroStorage about it.
- Fix the JythonScriptEngine code to specially downcast Double to Float if
it seems a float defined in outputSchema. This will be the best fix. If this
done adding documentation to define schema as double is not required. POCast
changes to Number will also be redundant for this scenario, but it is probably
good to do that anyways to address other cases of schema mismatches with
casting.
was (Author: rohini):
Found this issue recently reported by one of our users.
+1 for documenting in Pig manual. But we need not do bytearray->float
typecasting. Defining schema as double works and is a simpler alternative. We
should mandate that the user define the outputSchema as double whenever there
is a Python float.
{code}
@outputSchema("v: double")
def test(v):
return v
{code}
Our user ran into issue when they tried to do v > 0 ? 0 : 1 and res.result =
Integer.valueOf(((Float) res.result).intValue()); in POCast threw
java.lang.ClassCastException: java.lang.Double cannot be cast to
java.lang.Float. For those cases fixing POCast to typecast to Number instead
of Long,Float,Double,BigDecimal,BigInteger should avoid those errors. Since
http://pig.apache.org/docs/r0.15.0/basic.html#cast supports type casting
between all these types it should be fine.
http://pig.apache.org/docs/r0.15.0/basic.html#cast does not contain information
for BigDecimal and BigInteger types and need to be updated.
> Pig doesn't take care of declared float type and converts it to double
> ----------------------------------------------------------------------
>
> Key: PIG-3357
> URL: https://issues.apache.org/jira/browse/PIG-3357
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.11
> Environment: cdh 4.3.0
> Reporter: Sergey
>
> Here is the script:
> {code}
> register /usr/lib/pig/lib/avro-1.7.4.jar;
> register /usr/lib/pig/lib/json-simple-1.1.jar;
> register /usr/lib/pig/piggybank.jar;
> register test.py using jython as udf;
> table_in = load 'in' as (v: float);
> table_out = foreach table_in generate udf.test(v);
> store table_out into 'out' using
> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"name": "test",
> "type": "float"}');
> {code}
> Here is UDF:
> {code}
> @outputSchema("v: float")
> def test(v):
> return v
> {code}
> Here is an input:
> {code}
> 1
> {code}
> Here is the stacktrace:
> java.lang.Exception:
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.io.IOException: Cannot convert to float:class java.lang.Double
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.io.IOException: Cannot convert to float:class java.lang.Double
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:260)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: java.io.IOException: Cannot convert to float:class java.lang.Double
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeFloat(PigAvroDatumWriter.java:281)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:88)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
> ... 21 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)