[
https://issues.apache.org/jira/browse/PIG-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628478#comment-13628478
]
Cheolsoo Park commented on PIG-3268:
------------------------------------
[~aniket486], thank you for your comment, but that's not my issue. In your
example, I build (c=c2 ? e2 : (c=c1 ? e1 : e3)).
Let me explain my issue with an example.
{code}
A = LOAD '1.txt' USING PigStorage(',') AS (i:int);
B = FOREACH A GENERATE i, ( -- Note I have an extra column "i" besides CASE
expression
CASE (i % 3)
WHEN 0 THEN '3n'
WHEN 1 THEN '3n+1'
ELSE '3n+2'
END
);
{code}
This fails with the following error:
{code}
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while
executing [Mod (Name: Mod[int] - scope-9 Operator Key: scope-9) children:
[[POProject (Name: Project[int][*] - scope-7 Operator Key: scope-7) children:
null at []], [ConstantExpression (Name: Constant(3) - scope-8 Operator Key:
scope-8) children: null at []]] at []]: java.lang.ClassCastException:
org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.Number
{code}
When I compare explain of CASE against that of manually written nested bin
operators, I can only see the following difference:
{code:title=CASE}
| |---Equal To[boolean] - scope-11
| | |
| | |---Mod[int] - scope-9
| | | |
| | | |---Project[int][*] - scope-7 // this line
| | | |
| | | |---Constant(3) - scope-8
{code}
{code:title=Bincond}
| |---Equal To[boolean] - scope-11
| | |
| | |---Mod[int] - scope-9
| | | |
| | | |---Project[int][1] - scope-7 // this line
| | | |
| | | |---Constant(3) - scope-8
{code}
I am puzzled why "i" in "(i % 3)" is translated to "Project[int][*]" in CASE,
whereas it is "Project[int][1]" in nested bin operators.
Thanks!
> Case statement support
> ----------------------
>
> Key: PIG-3268
> URL: https://issues.apache.org/jira/browse/PIG-3268
> Project: Pig
> Issue Type: New Feature
> Components: internal-udfs, parser
> Affects Versions: 0.11
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3268-2.patch, PIG-3268-3.patch, PIG-3268-4.patch,
> PIG-3268.patch
>
>
> Currently, Pig has no support for case statement. To mimic it, users often
> use nested bincond operators. However, that easily becomes unreadable when
> there are multiple levels of nesting.
> For example,
> {code}
> a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
> b = FOREACH a GENERATE (
> i % 3 == 0 ? '3n' : (i % 3 == 1 ? '3n + 1' : '3n + 2')
> );
> {code}
> This can be re-written much more nicely using case statement as follows:
> {code}
> a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
> b = FOREACH a GENERATE (
> CASE i % 3
> WHEN 0 THEN '3n'
> WHEN 1 THEN '3n + 1'
> ELSE '3n + 2'
> END
> );
> {code}
> I propose that we implement case statement in the following manner:
> * Add built-in UDFs that take expressions as args. Take for example the
> aforementioned case statement, we can define a UDF such as {{builtInUdf(i %
> 3, 0, '3n', 1, '3n + 1', '3n + 2')}}.
> * Add syntactical sugar for these built-in UDFs.
> In fact, I burrowed this idea from HIVE-164.
> One downside of this approach is that all the possible args schemas of these
> UDFs must be pre-computed. Specifically, we need to populate the full list of
> possible args schemas in {{EvalFunc.getArgToFuncMapping}}.
> In particular, since we obviously cannot support infinitely long args, it is
> necessary to impose a limit on the size of when branches. For now, I
> arbitrarily chose 50, but it can be easily changed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira