AllLoader in trunk does not work properly with JSON schemas
-----------------------------------------------------------

                 Key: PIG-2099
                 URL: https://issues.apache.org/jira/browse/PIG-2099
             Project: Pig
          Issue Type: Bug
          Components: data
    Affects Versions: 0.8.1, 0.8.0
            Reporter: Chris Pesto


The AllLoader in the Piggybank in trunk does not pass JSON-defined schemas to 
the child loaders it instantiates.  If the schema is defined in the LOAD 
function, when Pig calls getSchema on the AllLoader the AllLoader instantiates 
the child loader and calls the child's getSchema if it respects the 
LoadMetadata interface.  If the AllLoader finds the JSON schema in a file, it 
does not instantiate the child loader until prepareToRead is called, and the 
child does not receive the schema.  I have hacked this in by adding to the 
AllLoader:

        transient String location = null;
        transient Job job = null;

then in AllLoader::setLocation:

        this.location = location;
        this.job = job;

then in AllLoader::prepareToRead:

        if (childLoadFunc instanceof LoadMetadata) {
                ((LoadMetadata) childLoadFunc).getSchema(location, job);
        }

Although I suspect it is not good practice to store the location/job in the 
class variables like that, I don't know a better way to fix this.

------

Also, getFuncSpecFromContent in the accompanying LoadFuncHelper class with the 
AllLoader should be modified:

        funcSpec = new 
FuncSpec("org.apache.pig.piggybank.storage.PigStorageSchema()");

since it currently instantiates a normal PigStorage object, which does not 
understand pre-defined schemas.  The documentation for the AllLoader should 
reference PigStorageSchema instead of PigStorage as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to