Cannot directly cast output of UDF
----------------------------------

                 Key: PIG-1718
                 URL: https://issues.apache.org/jira/browse/PIG-1718
             Project: Pig
          Issue Type: Improvement
          Components: impl
    Affects Versions: 0.7.0
         Environment: Macbook Pro 6.2, Ubuntu 10.04 AMD64, CDH3 beta 3
            Reporter: Mike Dillon
            Priority: Minor


I'm in the process of writing a suite of UDFs to deal with nested JSON data 
inside of Pig. In one case, I created a UDF of type EvalFunc<String> and wanted 
to use it like so:

{code}
RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
IN = foreach RAW generate id, ExtractString(json, 'count') as count:int
{code}

When I do this, I get the following error:

{quote}
ERROR 1022: Type mismatch merging schema prefix. Field Schema: chararray. Other 
Field Schema: count: int
{quote}

I can work around it by adding another projection with just a cast (as below), 
but I'd prefer if the form I just first just worked.

{code}
RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
MID = foreach RAW generate id, ExtractString(json, 'count') as count
IN = foreach MID generate id, (int)count
{code}

I'd prefer not to have to have ExtractInteger extends EvalFun<Integer> if I can 
avoid it. In our case, it gets even more cumbersome because we want to have 
something like ExtractStringTuple extends EvalFunc<Tuple> that returns a tuple 
of strings without parsing the JSON over and over again:

{code}
RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
IN = foreach RAW generate id, ExtractStringTuple(json, 'name', 'count', 'mean') 
as (name, count:int, mean:double);
{code}

As indicated, I have tested this with Pig 0.7.0. My apologies if this is 
already fixed in 0.8 since I was not able to test with a newer version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to