zaki rahaman
Mon, 15 Mar 2010 18:35:33 -0700
I'm not sure I understand. Of course you can specify partial filename matches or patterns as well as directory globs.... I'm not sure why you need to reinvent the wheel here so to speak. On Mon, Mar 15, 2010 at 8:58 PM, jiang licht <licht_ji...@yahoo.com> wrote: > Thanks, Alan. > > That is what we are doing right now. But sometimes, we only want to include > some files in one folder and you cannot simply use regular expression on > file names to separate what you want from what you don't want. That's why we > want a generic solution. > > This is helpful since it's often reasonable to keep only one copy of a big > data set. Then when someone needs to do some analysis on a subset, he only > needs to fill out a list of files in the subset and uses the load function > to load them from the list (symlink may do the job but not available in fs > shell). From previous post, this seems to be simple. But I haven't found > time to actually look at how to write such a function. Is there some sample > code out there and any hints for doing this? > > Thanks! > > Michael > > --- On Mon, 3/15/10, Alan Gates <ga...@yahoo-inc.com> wrote: > > From: Alan Gates <ga...@yahoo-inc.com> > Subject: Re: Custom load function? > To: pig-user@hadoop.apache.org > Date: Monday, March 15, 2010, 1:45 PM > > PigStorage (the default load function) takes Hadoop regular expressions. > So as long as you can express these files in a valid Hadoop regular > expression it should work fine. > > Alan. > > On Mar 9, 2010, at 7:56 PM, jiang licht wrote: > > > Before I read the example, here's a simple thing that I want to know how > to implement but not sure: I have a list of files which are scattered in > different folders in a hadoop cluster, instead of firing multiple "load" to > read each file, I want to put the full path names of these files on a list > and then have a load function that can take the file name of the list as an > argument and then load these files ... > > > > Thanks, > > > > Michael > > > > > > > > -- Zaki Rahaman