[ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5272:
------------------------------------
      Resolution: Fixed
        Assignee: Joshua Juen
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

+1. Committed to trunk. Thanks for fixing this [~juen1jp].

> BagToTuple output schema is incorrect
> -------------------------------------
>
>                 Key: PIG-5272
>                 URL: https://issues.apache.org/jira/browse/PIG-5272
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.17.0
>            Reporter: Joshua Juen
>            Assignee: Joshua Juen
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.18.0
>
>         Attachments: BagToTupleSchema.patch
>
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to