Haley Thrapp created PIG-4608:
---------------------------------

             Summary: FOREACH ... UPDATE
                 Key: PIG-4608
                 URL: https://issues.apache.org/jira/browse/PIG-4608
             Project: Pig
          Issue Type: New Feature
            Reporter: Haley Thrapp


I would like to propose a new command in Pig, FOREACH...UPDATE.

Syntactically, it would look much like FOREACH … GENERATE.

Example:

Input data:
(1,2,3)
(2,3,4)
(3,4,5)

-- Load the data
three_numbers = LOAD 'input_data'
USING PigStorage()
AS (f1:int, f2:int, f3:int);

-- Sum up the row
updated = FOREACH three_numbers UPDATE
5 as f1,
f1+f2 as new_sum
;

Dump sums;
(5,2,3,3)
(5,3,4,5)
(5,4,5,7)

Fields to update must be specified by alias. Any fields in the UPDATE that do 
not match an existing field will be appended to the end of the tuple.

This command is particularly desirable in scripts that deal with a large number 
of fields (in the 20-200 range). Often, we need to only make modifications to a 
few fields. The FOREACH ... UPDATE statement, allows the developer to focus on 
the actual logical changes instead of having to list all of the fields that are 
also being passed through.

My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
this can be done with changes to the parser and the creation of a new LOUpdate. 
No physical plan changes should be needed because we will leverage what 
LOGenerate does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to