[ 
https://issues.apache.org/jira/browse/PIG-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3269:
-------------------------------

    Description: 
This is another language improvement using the same approach as in PIG-3268.

Currently, Pig has no support for IN operator. To mimic it, users often have to 
concatenate several OR operators.

For example,
{code}
a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY 
   (i == 1) OR
   (i == 22) OR
   (i == 333) OR
   (i == 4444) OR
   (i == 55555);
{code}
But this can be re-rewritten in a more compact manner using IN operator as 
follows: 
{code}
a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY i IN (1,22,333,4444,55555);
{code}
I propose that we implement IN operator in the following manner:
* Add built-in UDFs that take expressions as args. Take for example the 
aforementioned IN operator, we can define a UDF such as {{builtInUdf(i, 1, 22, 
333, 4444, 55555)}}.
* Add syntactical sugar for these built-in UDFs.


  was:
This is another language improvement using the same approach as in PIG-3268.

Currently, Pig has no support for IN operator. To mimic it, users often have to 
concatenate several OR operators.

For example,
{code}
a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY 
   (i == 1) OR
   (i == 22) OR
   (i == 333) OR
   (i == 4444) OR
   (i == 55555);
{code}
But this can be re-rewritten in a more compact manner using IN operator as 
follows: 
{code}
a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY i IN (1,22,333,4444,55555);
{code}
I propose that we implement IN operator in the following manner:
* Add built-in UDFs that take expressions as args. Take for example the 
aforementioned IN operator, we can define a UDF such as {{builtInUdf(i, 1, 22, 
333, 4444, 55555)}}.
* Add syntactical sugar for these built-in UDFs.

Similarly to PIG-3268, this approach requires a limit on the number of values. 
This is again because  we need to populate the full list of possible args 
schemas in {{EvalFunc.getArgToFuncMapping}}. For now, I arbitrarily chose 50, 
but it can be easily changed.



ReviewBoard request: https://reviews.apache.org/r/10337/
                
> In operator support
> -------------------
>
>                 Key: PIG-3269
>                 URL: https://issues.apache.org/jira/browse/PIG-3269
>             Project: Pig
>          Issue Type: New Feature
>          Components: internal-udfs, parser
>    Affects Versions: 0.11
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.12
>
>         Attachments: PIG-3269-2.patch, PIG-3269-3.patch, PIG-3269.patch
>
>
> This is another language improvement using the same approach as in PIG-3268.
> Currently, Pig has no support for IN operator. To mimic it, users often have 
> to concatenate several OR operators.
> For example,
> {code}
> a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
> b = FILTER a BY 
>    (i == 1) OR
>    (i == 22) OR
>    (i == 333) OR
>    (i == 4444) OR
>    (i == 55555);
> {code}
> But this can be re-rewritten in a more compact manner using IN operator as 
> follows: 
> {code}
> a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
> b = FILTER a BY i IN (1,22,333,4444,55555);
> {code}
> I propose that we implement IN operator in the following manner:
> * Add built-in UDFs that take expressions as args. Take for example the 
> aforementioned IN operator, we can define a UDF such as {{builtInUdf(i, 1, 
> 22, 333, 4444, 55555)}}.
> * Add syntactical sugar for these built-in UDFs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to