Carlos Balduz created PIG-4583:
----------------------------------
Summary: Allow UDFs to return multiple values (not inside a Bag or
a Tuple)
Key: PIG-4583
URL: https://issues.apache.org/jira/browse/PIG-4583
Project: Pig
Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Carlos Balduz
When writing a UDF, you are forced to return only one value, or multiple values
but inside a Tuple or a Bag. Although this may not seem a big problem, when
using Pig in production and performing multiple JOINs, GROUP BYs and any
operations that make your schema more and more complicated, it would be nice to
create a UDF to reduce the size of your schema, for example:
```
rel1 = load 'a' using PigStorage(';', '-schema');
rel2 = load 'b' using PigStorage(';', '-schema');
joined = join rel1 by id_whatever, rel2 by id_whatever;
... perform operations
another_rel = load 'c' using PigStorage('.'.'-schema');
final_rel = join another_rel by id_whatever, joined by id_whatever;
```
Will have an schema like:
```
describe final_rel;
rel1::joined::id_whatever, rel1:joined::field_1, ......
```
When you have scripts with hundreds or thousands of lines of code, you end up
having more foreachs to rename fields than with actual code. Therefore, I wrote
a UDF to handle this so I wouldn't have to write a foreach to rename 100 fields
one by one.
However, due to Pig's limitation of returning only one value, I must place my
return values inside a Tupe or a Bag, flatten it, and have another
`something::` for each of the fields.
Can we remove this limitation? And if it is done, perhaps upload the UDF I
wrote... I think it is a VERY useful function for production environments and
large scripts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)