The logic inside of my exec() function is different than that of
FILTERFROMFILE.java, but the rest of my class differs very little except
for the fact that I have two parameters. The other difference and what
causes my FilterFunc implementation to fail is the Override of
getArgToFuncMapping(). I don't really need that, so I've commented it
out and everything works fine now. I'm not sure why the Override was a
problem however.
/* (non-Javadoc)
* @see org.apache.pig.EvalFunc#getArgToFuncMapping()
* This is needed to make sure that both bytearrays and chararrays can
be passed as arguments
*/
@Override
public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
List<FuncSpec> funcList = new ArrayList<FuncSpec>();
funcList.add(new FuncSpec(this.getClass().getName(), new
Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));
return funcList;
}
-Sean
Alan Gates wrote:
Can you include the load function from your script to show how you're
using it? One issue is that you cannot define constructor arguments
for your load function in DEFINE, you have to do it in LOAD, USING
X(args go here). Also, the load function is called on the user's box
with arguments passed to it in the USING clause. It is then
serialized and passed to the hadoop machines, where it is
deserialized. At this point the default constructor is called
(because that's how Java deserializes objects). So if those
constructor arguments are needed on the backend they need to be cached
when the function is constructed on the front end. So you may need to
add logic to explicitly store the filename so it's available at run time.
Alan.
On Apr 20, 2009, at 2:27 PM, Sean Timm wrote:
PIG-546 indicates that it is now possible to pass arguments into a
custom UDF filter function via a parameterized constructor. I'm
using a TRUNK build from April 1 (svn rev. 761067) which appears to
have the patch applied, but I'm getting the same errors that the
patch describes. Should this work? Is there a better way to pass
parameters/configuration into a UDF filter function?
The parameterized constructor is called 3 times, followed by the
default constructor being called 4 times.
On the Hadoop backend:
2009-04-20 17:11:29,935 ERROR com.aol.search.pig.udf.ValidateQuery:
default constructor
2009-04-20 17:11:30,034 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.lang.IllegalArgumentException: Can not create a Path from a null
string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
at org.apache.hadoop.fs.Path.<init>(Path.java:90)
at
com.aol.search.pig.udf.ValidateQuery.loadList(ValidateQuery.java:74)
at com.aol.search.pig.udf.ValidateQuery.init(ValidateQuery.java:66)
at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:91)
at com.aol.search.pig.udf.ValidateQuery.exec(ValidateQuery.java:35)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:251)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
Thanks,
Sean