Relax the condition to find output/input dependency
---------------------------------------------------
Key: PIG-1970
URL: https://issues.apache.org/jira/browse/PIG-1970
Project: Pig
Issue Type: Improvement
Reporter: Daniel Dai
Priority: Minor
Pig will create an output/input dependency if the output generated by Pig
script feeding to a load statement. So that Pig will not launch two jobs
simultaneously (which will result a input file not exist error). For example:
{code}
STORE A INTO '/user/myname/myoutputfolder';
D = LOAD '/user/myname/myoutputfolder';
{code}
Load will be in a map-reduce job after /user/myname/myoutputfolder is generated.
However, currently we only do exact match. If we load part of the data, we
cannot figure out the dependency. Eg:
{code}
STORE A INTO '/user/myname/myoutputfolder';
D = LOAD '/user/myname/myoutputfolder/part*' ;
{code}
We should be more intelligent to find this dependency.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira