AllLoader in trunk does not work properly with JSON schemas
-----------------------------------------------------------
Key: PIG-2099
URL: https://issues.apache.org/jira/browse/PIG-2099
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.8.1, 0.8.0
Reporter: Chris Pesto
The AllLoader in the Piggybank in trunk does not pass JSON-defined schemas to
the child loaders it instantiates. If the schema is defined in the LOAD
function, when Pig calls getSchema on the AllLoader the AllLoader instantiates
the child loader and calls the child's getSchema if it respects the
LoadMetadata interface. If the AllLoader finds the JSON schema in a file, it
does not instantiate the child loader until prepareToRead is called, and the
child does not receive the schema. I have hacked this in by adding to the
AllLoader:
transient String location = null;
transient Job job = null;
then in AllLoader::setLocation:
this.location = location;
this.job = job;
then in AllLoader::prepareToRead:
if (childLoadFunc instanceof LoadMetadata) {
((LoadMetadata) childLoadFunc).getSchema(location, job);
}
Although I suspect it is not good practice to store the location/job in the
class variables like that, I don't know a better way to fix this.
------
Also, getFuncSpecFromContent in the accompanying LoadFuncHelper class with the
AllLoader should be modified:
funcSpec = new
FuncSpec("org.apache.pig.piggybank.storage.PigStorageSchema()");
since it currently instantiates a normal PigStorage object, which does not
understand pre-defined schemas. The documentation for the AllLoader should
reference PigStorageSchema instead of PigStorage as well.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira