[
https://issues.apache.org/jira/browse/PIG-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172107#comment-14172107
]
Daniel Dai commented on PIG-4227:
---------------------------------
[~cheolsoo], looked at scriptingudf.complexTypes, python udf return a list of
tuples for the bag field. When serialize the output, bag adds |{_ and tuple
adds |(_. So this part seems Ok.
I don't totally understand the issue in the description, is that because jython
adds tuple inside a list automatically but python does not?
> Streaming Python UDF handles bag outputs incorrectly
> ----------------------------------------------------
>
> Key: PIG-4227
> URL: https://issues.apache.org/jira/browse/PIG-4227
> Project: Pig
> Issue Type: Bug
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.14.0
>
> Attachments: PIG-4227-1.patch
>
>
> I have a udf that generates different outputs when running as jython and
> streaming python.
> {code:title=jython}
> {([[BBC Worldwide]])}
> {code}
> {code:title=streaming python}
> {(BC Worldwid)}
> {code}
> The problem is that streaming python encodes a bag output incorrectly. For
> this particular example, it serializes the output string as follows-
> {code}
> |{_[[BBC Worldwide]]|}_
> {code}
> where '|' and '\_' wrap bag delimiters '\{' and '\}'. i.e. '\{' => '|\{\_'
> and '\}' => '|\}\_'.
> But this is wrong because bag must contain tuples not chararrays. i.e. the
> correct encoding is as follows-
> {code}
> |{_|(_[[BBC Worldwide]]|)_|}_
> {code}
> where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters.
> This results in truncated outputs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)