[ https://issues.apache.org/jira/browse/PIG-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980720#action_12980720 ]
Gerrit Jansen van Vuuren commented on PIG-1717: ----------------------------------------------- Yes I've used hadoop path globbing, but it has its own problems e.g. If the files are three levels down in a partitioning scheme e.g. /mylog/year=2011/month=01/day=01/hour=00 then the file globbing has to be /mylog/year=2011/month=01/day*/hour*/* Its doable but not friendly when writing the scripts and its very easy to forget to add the extra * in for each level. And apart from that having the ability for script writers to filter by partitions as if they were just another column in the schema is a great advantage; keeps things simpler, they don't need to think or learn about file globbing, scripts are also much cleaner and its easy to read what the script is doing. Thanks for the approval. I'll redo a patch with the required comments and references to this JIRA, then try and find an obvious place to insert the information into the WIKI page and make a patch for that also. > pig needs to call setPartitionFilter if schema is null but getPartitionKeys > is not > ---------------------------------------------------------------------------------- > > Key: PIG-1717 > URL: https://issues.apache.org/jira/browse/PIG-1717 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.9.0 > Reporter: Gerrit Jansen van Vuuren > Assignee: Gerrit Jansen van Vuuren > Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-1717.patch > > > I'm writing a loader that works with hive style partitioning e.g. > /logs/type1/daydate=2010-11-01 > The loader does not know the schema upfront and this is something that the > user adds in the script using the AS clause. > The problem is that this user defined schema is not available to the loader, > so the loader cannot return any schema, the Loader does know what the > partition keys are and pig needs in some way to know about these partition > keys. > Currently if the schema is null pig never calls the > LoadMetaData:getPartitionKeys method or the setPartitionFilter method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.