[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596462#comment-14596462
 ] 

Kevin J. Price commented on PIG-4608:
-------------------------------------

Several of us actually discussed this at some length, and didn't think it was 
worth differentiating between modified columns and appended columns in the 
command. Two ideas we had:
# A token, like you have, indicating that the remaining fields are being added. 
We were considering using an 'ADD' keyword. As in:
{code}
updated = FOREACH three_numbers UPDATE 3 AS f3, 6 AS f6 ADD f1+f2 AS new_sum;
{code}
# Separate statements for 'strict' versus 'non-strict' mode. e.g., for updating 
with appending you would use
{code}
updated = FOREACH three_numbers UPDATE_STRICT 3 AS f3, 6 AS f6;
{code}
and for updating with appending, you could use
{code}
updated = FOREACH three_numbers UPDATE 3 AS f3, 6 AS f6, f1+f2 AS new_sum;
{code}

However, our overall view from writing pig scripts is that chances are very few 
people would ever want to use the strict mode, nor did we see much value in 
having the extra token (ADD or ...) separating out appended columns. >From a 
programming viewpoint, it just makes more logical sense to us to view it as an 
implicit "update or add" construct.

> FOREACH ... UPDATE
> ------------------
>
>                 Key: PIG-4608
>                 URL: https://issues.apache.org/jira/browse/PIG-4608
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Haley Thrapp
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do 
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large 
> number of fields (in the 20-200 range). Often, we need to only make 
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
> developer to focus on the actual logical changes instead of having to list 
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
> this can be done with changes to the parser and the creation of a new 
> LOUpdate. No physical plan changes should be needed because we will leverage 
> what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to