[ https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy updated PIG-5272: ------------------------------------ Summary: BagToTuple output schema is incorrect (was: BagToTuple Output Schema) > BagToTuple output schema is incorrect > ------------------------------------- > > Key: PIG-5272 > URL: https://issues.apache.org/jira/browse/PIG-5272 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.17.0 > Reporter: Joshua Juen > Priority: Minor > Labels: patch > Fix For: 0.18.0 > > Attachments: BagToTupleSchema.patch > > > The output schema from BagToTuple is nonsensical causing problems using the > tuple later in the same script. > For example: Given a bag: { data:chararray }, calling BagToTuple yields the > schema: ( data:chararray ) > But, this makes no sense since if the above bag contains: {data1, data2, > data3} entries, the output tuple from BagToTuple will be: > (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the > declared output schema from the UDF. > Unfortunately, the schema of the tuple cannot be known during the initial > validation phase. Thus, I believe the output schema from the UDF should be > modified to be type tuple without the number of fields being fixed to the > number of columns in the input bag. > Under the current way, the elements in the tuple cannot be accessed in the > script after calling BagToTuple without getting an incompatible type error. > We have modified the UDF in our internal UDF jars to work around the issue. > Let me know if this sounds reasonable and I can generate the patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)