New logical plan failing with ERROR 2229: Couldn't find matching uid -1
------------------------------------------------------------------------
Key: PIG-1979
URL: https://issues.apache.org/jira/browse/PIG-1979
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Vivek Padmanabhan
The below is my script
{code}
register myudf.jar;
c01 = LOAD 'input' USING org.test.MyTableLoader('');
c02 = FILTER c01 BY result == 'OK' AND formatted IS NOT NULL AND formatted
!= '' ;
c03 = FOREACH c02 GENERATE url, formatted, FLATTEN(usage);
c04 = FOREACH c03 GENERATE usage::domain AS domain, url, formatted;
doc_001 = FOREACH c04 GENERATE domain,url, FLATTEN(MyExtractor(formatted)) AS
category;
doc_004_1 = GROUP doc_001 BY (domain,url);
doc_005 = FOREACH doc_004_1 GENERATE group.domain as domain, group.url as url,
doc_001.category as category;
STORE doc_005 INTO 'out_final' USING PigStorage();
review1 = FOREACH c04 GENERATE domain,url, MyExtractor(formatted) AS rev;
review2 = FILTER review1 BY SIZE(rev)>0;
joinresult = JOIN review2 by (domain,url), doc_005 by (domain,url);
finalresult = FOREACH joinresult GENERATE doc_005::category;
STORE finalresult INTO 'out_final' using PigStorage();
{code}
The script is failing in building the plan, while applying for logical
optimization rule for AddForEach.
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid
-1 for project (Name: Project Type: bytearray Uid: 106 Input: 0 Column: 5)
The problem is happening when I try to include doc_005::category in the
projection for relation finalresult. This is field is orginated from the udf
org.vivek.udfs.MyExtractor (source given below).
{code}
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.*;
import org.apache.pig.impl.logicalLayer.FrontendException;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;
public class MyExtractor extends EvalFunc<DataBag>
{
@Override
public Schema outputSchema(Schema arg0) {
try {
return Schema.generateNestedSchema(DataType.BAG,
DataType.CHARARRAY);
} catch (FrontendException e) {
System.err.println("Error while generating schema. "+e);
return new Schema(new FieldSchema(null, DataType.BAG));
}
}
@Override
public DataBag exec(Tuple inputTuple)
throws IOException
{
try {
Tuple tp2 = TupleFactory.getInstance().newTuple(1);
tp2.set(0, (inputTuple.get(0).toString()+inputTuple.hashCode()));
DataBag retBag = BagFactory.getInstance().newDefaultBag();
retBag.add(tp2);
return retBag;
}
catch (Exception e) {
throw new IOException(" Caught exception", e);
}
}
}
{code}
The script goes through fine if I disable AddForEach rule by -t AddForEach
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira