[ 
https://issues.apache.org/jira/browse/PIG-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-2824:
------------------------

    Attachment: 2824.patch

Attached an initial patch that introduces a new interface LoadCheckSchema (any 
better naming?) that defines whether the loader is able to check the number of 
fields, i.e. padding null fields or throwing away extra fields if necessary. If 
a given loader does not implement this interface it'd be assumed that it's 
unable to check fields, and Pig will use a FOREACH to project all fields, which 
can be more expensive. Initially only PigStorage implements this interface.

Adjusted (not reverted) unit tests modified by PIG-1188, by adding type info in 
the schema, so those tests won't be affect by this optimization now (and in 
future when it's got disabled). Also add a few tests verifying FOREACH is not 
generated if all types are bytearray.

Any comment is appreciated.
                
> Pushing checking number of fields into LoadFunc
> -----------------------------------------------
>
>                 Key: PIG-2824
>                 URL: https://issues.apache.org/jira/browse/PIG-2824
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.9.0, 0.10.0
>            Reporter: Jie Li
>         Attachments: 2824.patch, 2824.png
>
>
> As described in PIG-1188, if users define a schema (w or w/o types), we need 
> to check the number of fields after loading data, so if there are less fields 
> we need to pad null fields, and if there are more fields we need to throw 
> them away. 
> For schema with types, Pig used to insert a Foreach after the loader for type 
> casting which also checks #fields. For schema without types there was no such 
> Foreach, thus PIG-1188 inserted one just for checking #fields. Unfortunately, 
> Foreach is too expensive for such checking, and ideally we can push it into 
> the loader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to