[ https://issues.apache.org/jira/browse/PIG-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931304#action_12931304 ]
Santhosh Srinivasan commented on PIG-1718: ------------------------------------------ This should be fixed as part of the semantics cleanup. The foreach allows the specification of the type information when the current semantic is plain aliasing. > Cannot directly cast output of UDF > ---------------------------------- > > Key: PIG-1718 > URL: https://issues.apache.org/jira/browse/PIG-1718 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.7.0 > Environment: Macbook Pro 6.2, Ubuntu 10.04 AMD64, CDH3 beta 3 > Reporter: Mike Dillon > Priority: Minor > > I'm in the process of writing a suite of UDFs to deal with nested JSON data > inside of Pig. In one case, I created a UDF of type EvalFunc<String> and > wanted to use it like so: > {code} > RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray ); > IN = foreach RAW generate id, ExtractString(json, 'count') as count:int > {code} > When I do this, I get the following error: > {quote} > ERROR 1022: Type mismatch merging schema prefix. Field Schema: chararray. > Other Field Schema: count: int > {quote} > I can work around it by adding another projection with just a cast (as > below), but I'd prefer if the form I just first just worked. > {code} > RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray ); > MID = foreach RAW generate id, ExtractString(json, 'count') as count > IN = foreach MID generate id, (int)count > {code} > I'd prefer not to have to have ExtractInteger extends EvalFun<Integer> if I > can avoid it. In our case, it gets even more cumbersome because we want to > have something like ExtractStringTuple extends EvalFunc<Tuple> that returns a > tuple of strings without parsing the JSON over and over again: > {code} > RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray ); > IN = foreach RAW generate id, ExtractStringTuple(json, 'name', 'count', > 'mean') as (name, count:int, mean:double); > {code} > As indicated, I have tested this with Pig 0.7.0. My apologies if this is > already fixed in 0.8 since I was not able to test with a newer version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.