Cheolsoo Park created PIG-4227:
----------------------------------
Summary: Streaming Python UDF handles bag outputs incorrectly
Key: PIG-4227
URL: https://issues.apache.org/jira/browse/PIG-4227
Project: Pig
Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Fix For: 0.15.0
I have a udf that generates different outputs when running as jython and
streaming python.
{code:title=jython}
{([[BBC Worldwide]])}
{code}
{code:title=streaming python}
{(BC Worldwid)}
{code}
The problem is that streaming python encodes a bag output incorrectly. For this
particular example, it serializes the output string as follows-
{code}
|{_[[BBC Worldwide]]|}_
{code}
where '|' and '\_' wrap bag delimiters '\{' and '\}'. i.e. '\{' => '|\{\_' and
'\}' => '|\}\_'.
But this is wrong because bag must contain tuples not chararrays. i.e. the
correct encoding is as follows-
{code}
|{_|(_[[BBC Worldwide]]|)_|}_
{code}
where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters.
This results in truncated outputs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)