[ 
https://issues.apache.org/jira/browse/PIG-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980720#action_12980720
 ] 

Gerrit Jansen van Vuuren commented on PIG-1717:
-----------------------------------------------


Yes I've used hadoop path globbing, but it has its own problems e.g. 
   If the files are three levels down in a partitioning scheme e.g. 
/mylog/year=2011/month=01/day=01/hour=00 then the file globbing has to be 
/mylog/year=2011/month=01/day*/hour*/* 
   Its doable but not friendly when writing the scripts and its very easy to 
forget to add the extra * in for each level.

 And apart from that having the ability for script writers to filter by 
partitions as if they were just another column in the schema is a great 
advantage; keeps things simpler, they don't need to think or learn about file 
globbing, scripts are also much cleaner and its easy to read what the script is 
doing.

Thanks for the approval.  I'll redo a patch with the required comments and 
references to this JIRA, then try and find an obvious place to insert the 
information into the WIKI page and make a patch for that also.




> pig needs to call setPartitionFilter if schema is null but getPartitionKeys 
> is not
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-1717
>                 URL: https://issues.apache.org/jira/browse/PIG-1717
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: PIG-1717.patch
>
>
> I'm writing a loader that works with hive style partitioning e.g. 
> /logs/type1/daydate=2010-11-01
> The loader does not know the schema upfront and this is something that the 
> user adds in the script using the AS clause.
> The problem is that this user defined schema is not available to the loader, 
> so the loader cannot return any schema, the Loader does know what the 
> partition keys are and pig needs in some way to know about these partition 
> keys. 
> Currently if the schema is null pig never calls the 
> LoadMetaData:getPartitionKeys method or the setPartitionFilter method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to