[jira] [Comment Edited] (PIG-3357) Pig doesn't take care of declared float type and converts it to double

Rohini Palaniswamy (JIRA) Wed, 02 Dec 2015 04:18:08 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035697#comment-15035697
 ]


Rohini Palaniswamy edited comment on PIG-3357 at 12/2/15 12:16 PM:
-------------------------------------------------------------------

Found this issue recently reported by one of our users.

+1 for documenting in Pig manual. But we need not do bytearray->float 
typecasting. Defining schema as double works and is a simpler alternative. We 
should mandate that the user define the outputSchema as double whenever there 
is a Python float.

{code}
@outputSchema("v: double")
def test(v):
   return v
{code}

Our user ran into issue when they tried to do v > 0 ? 0 : 1 and res.result = 
Integer.valueOf(((Float) res.result).intValue()); in POCast threw 
java.lang.ClassCastException: java.lang.Double cannot be cast to 
java.lang.Float.  For those cases fixing POCast to typecast to Number instead 
of Long,Float,Double,BigDecimal,BigInteger should avoid those errors even if 
user wrongly declares the schema as float instead of double. Since 
http://pig.apache.org/docs/r0.15.0/basic.html#cast supports type casting 
between all these types it should be fine. 
http://pig.apache.org/docs/r0.15.0/basic.html#cast does not contain information 
for BigDecimal and BigInteger types and need to be updated.

For cases like the issue of storing reported with AvroStorage reported 
originally in this bug, we can either
   - leave it to user to define outputSchema as double as typecasting is done 
in avro libraries and nothing can be done in Pig's AvroStorage about it.
   - Fix the JythonScriptEngine code to specially downcast Double to Float if 
it seems a float defined in outputSchema. This will be the best fix. If this 
done adding documentation to define schema as double is not required. POCast 
changes to Number will also be redundant for this scenario, but it is probably 
good to do that anyways to address other cases of schema mismatches with 
casting.



was (Author: rohini):
Found this issue recently reported by one of our users.

+1 for documenting in Pig manual. But we need not do bytearray->float 
typecasting. Defining schema as double works and is a simpler alternative. We 
should mandate that the user define the outputSchema as double whenever there 
is a Python float.

{code}
@outputSchema("v: double")
def test(v):
   return v
{code}

Our user ran into issue when they tried to do v > 0 ? 0 : 1 and res.result = 
Integer.valueOf(((Float) res.result).intValue()); in POCast threw 
java.lang.ClassCastException: java.lang.Double cannot be cast to 
java.lang.Float.  For those cases fixing POCast to typecast to Number instead 
of Long,Float,Double,BigDecimal,BigInteger should avoid those errors. Since 
http://pig.apache.org/docs/r0.15.0/basic.html#cast supports type casting 
between all these types it should be fine. 
http://pig.apache.org/docs/r0.15.0/basic.html#cast does not contain information 
for BigDecimal and BigInteger types and need to be updated.


> Pig doesn't take care of declared float type and converts it to double
> ----------------------------------------------------------------------
>
>                 Key: PIG-3357
>                 URL: https://issues.apache.org/jira/browse/PIG-3357
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.11
>         Environment: cdh 4.3.0
>            Reporter: Sergey
>
> Here is the script:
> {code}
> register /usr/lib/pig/lib/avro-1.7.4.jar;
> register /usr/lib/pig/lib/json-simple-1.1.jar;
> register /usr/lib/pig/piggybank.jar;
> register test.py using jython as udf;
> table_in = load 'in' as (v: float);
> table_out = foreach table_in generate udf.test(v);
> store table_out into 'out' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"name": "test", 
> "type": "float"}');
> {code}
> Here is UDF:
> {code}
> @outputSchema("v: float")
> def test(v):
>   return v
> {code}
> Here is an input:
> {code}
> 1
> {code}
> Here is the stacktrace:
> java.lang.Exception: 
> org.apache.avro.file.DataFileWriter$AppendWriteException: 
> java.io.IOException: Cannot convert to float:class java.lang.Double
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: 
> java.io.IOException: Cannot convert to float:class java.lang.Double
>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:260)
>         at 
> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
>         at 
> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>         at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
>         at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
>         at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:679)
> Caused by: java.io.IOException: Cannot convert to float:class java.lang.Double
>         at 
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeFloat(PigAvroDatumWriter.java:281)
>         at 
> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:88)
>         at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257)
>         ... 21 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PIG-3357) Pig doesn't take care of declared float type and converts it to double

Reply via email to