[ 
https://issues.apache.org/jira/browse/PIG-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101714#comment-13101714
 ] 

Daniel Dai commented on PIG-2271:
---------------------------------

Can you do these:
1. Get the output schema for MyUDF. (describe activities)
2. Use a different construct for BinStorage: 
BinStorage("org.apache.pig.builtin.Utf8StorageConverter")

> PIG regression (in BinStorage?) between 0.8.1 and 0.9.x
> -------------------------------------------------------
>
>                 Key: PIG-2271
>                 URL: https://issues.apache.org/jira/browse/PIG-2271
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>            Reporter: Vincent BARAT
>
> I'm using the 0.9.x branch (tested at 2011-09-07).
> I've an UDF function that takes a bag as input:
> {code}
> public DataBag exec(Tuple input) throws IOException
> {
> /* Get the activity bag */
> DataBag activityBag = (DataBag) input.get(0);
> ...
> {code}
> My input data are read form a text file 'activity' (same issue when they are 
> read from HBase):
> {code}
> 00,1239698069000, <- this is the line that is not correctly handled
> 01,1239698505000,b
> 01,1239698369000,a
> 02,1239698413000,b
> 02,1239698553000,c
> 02,1239698313000,a
> 03,1239698316000,a
> 03,1239698516000,c
> 03,1239698416000,b
> 03,1239698621000,d
> 04,1239698417000,c
> {code}
> My first script is working correctly:
> {code}
> activities = LOAD 'activity' USING PigStorage(',') AS (sid:chararray, 
> timestamp:long, name:chararray);
> activities = GROUP activities BY sid;
> activities = FOREACH activities GENERATE group, MyUDF(activities.(timestamp, 
> name));
> store activities;
> {code}
> N.B. the name of the first activity is correctly set to null in my UDF 
> function.
> The issue occurs when I store my data into a binary file are reload them 
> before processing (I do this to improve the computation time, since HDFS is 
> much faster than HBase).
> Second script that triggers an error (this script work correctly with PIG 
> 0.8.1):
> {code}
> activities = LOAD 'activity' USING PigStorage(',') AS (sid:chararray, 
> timestamp:long, name:chararray);
> activities = GROUP activities BY sid;
> activities = FOREACH activities GENERATE group, activities.(timestamp, name);
> STORE activities INTO 'activities' USING BinStorage;
> activities = LOAD 'activities' USING BinStorage AS (sid:chararray, 
> activities:bag { activity: (timestamp:long, name:chararray) });
> activities = FOREACH activities GENERATE sid, MyUDF(activities);
> store activities;
> {code}
> In this script, when MyUDF is called, activityBag is null, and a warning is 
> issued:
> {code}
> 2011-09-07 15:24:05,365 | WARN | Thread-30 | PigHadoopLogger | 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast:
>  Unable to interpret value {(1239698069000,)} in field being converted to 
> type bag, caught ParseException <Cannot convert (1239698069000,) to 
> null:(timestamp:long,name:chararray)> field discarded
> {code}
> I guess that the regression is located into BinStorage...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to