[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340555#comment-16340555
 ] 

Koji Noguchi commented on PIG-4608:
-----------------------------------

{quote}I didn't see UPDATE/DROP in a single statement in the example, are we 
not going to support both in the same statement? I actually prefer those in the 
same statement, as I feel users usually think about adjusting all columns in 
the same time.
{quote}
This could be because I requested in one of my previous comments as. "For now, 
can we just require separate statements for update and delete ?" 
 I just wanted to keep it simple and leave the combining part later when we 
have more use cases.

Also, I'm afraid of confusions in overlapping index/fields.
 Say, {{A:(f0:int, f1:int, f2:int, f3:int)}}
{code:java}
B = FOREACH A drop f1 , update 2 with $1 ;
{code}
Is the code updating {{f2}} with the value of {{f1}}?
Or, updating {{f3}} with value of {{f2}} ? or something else?  

> FOREACH ... UPDATE
> ------------------
>
>                 Key: PIG-4608
>                 URL: https://issues.apache.org/jira/browse/PIG-4608
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Haley Thrapp
>            Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do 
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large 
> number of fields (in the 20-200 range). Often, we need to only make 
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
> developer to focus on the actual logical changes instead of having to list 
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
> this can be done with changes to the parser and the creation of a new 
> LOUpdate. No physical plan changes should be needed because we will leverage 
> what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to