support expression of indirect dependency between statements
-------------------------------------------------------------
Key: PIG-2212
URL: https://issues.apache.org/jira/browse/PIG-2212
Project: Pig
Issue Type: New Feature
Reporter: Thejas M Nair
There needs to be a better way to support following use case, than using exec.
It should be possible to express a dependency between a store statement and
another statement, to ensure that the store happens first. It is worth
considering if allowing user to specify dependency between any two statements
is going to be useful.
{noformat}
I have some data that I would like to store into a file and then load it in a
UDF to do some operations in the next pig statement.
For example,
doc_ids = FOREACH docs GENERATE doc_id;
STORE doc_ids INTO '$TEMP';
modifieddocs = FOREACH docs GENERATE myUDF('$TEMP', doc_id);
where myUDF loads doc_ids stored in '$TEMP' and does some operation using $TEMP
and doc_id.
Now I need to make sure that the "STORE doc_ids INTO '$TEMP';" occurs before
the FOREACH statement,
so that loading the index occurs smoothly. Is there anyway to guarantee that
that can happen?
{noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira