Cheolsoo Park created PIG-3269:
----------------------------------

             Summary: In operator support
                 Key: PIG-3269
                 URL: https://issues.apache.org/jira/browse/PIG-3269
             Project: Pig
          Issue Type: New Feature
          Components: internal-udfs, parser
    Affects Versions: 0.11
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
             Fix For: 0.12


This is another language improvement using the same approach as in PIG-3268.

Currently, Pig has no support for IN operator. To mimic it, users often have to 
concatenate several OR operators.

For example,
{code}
a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY 
   (i == 1) OR
   (i == 22) OR
   (i == 333) OR
   (i == 4444) OR
   (i == 55555);
{code}
But this can be re-rewritten in a more compact manner using IN operator as 
follows: 
{code}
a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY i IN (1,22,333,4444,55555);
{code}
I propose that we implement IN operator in the following manner:
* Add built-in UDFs that take expressions as args. Take for example the 
aforementioned case statement, we can define a UDF such as {{builtInUdf(i, 1, 
22, 333, 4444, 55555)}}.
* Add syntactical sugar for these built-in UDFs.

Similarly to PIG-3268, this approach requires a limit on the number of values. 
This is again because  we need to populate the full list of possible args 
schemas in {{EvalFunc.getArgToFuncMapping}}. For now, I arbitrarily chose 50, 
but it can be easily changed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to