Hey Edward, Can you just treat the files as external tables?
Later, Jeff On Sun, Apr 19, 2009 at 8:24 AM, Edward Capriolo <[email protected]>wrote: > On Sun, Apr 19, 2009 at 3:19 AM, Dhruba Borthakur <[email protected]> > wrote: > > HADOOP-4044 is scheduled to finally make it to 0.21 release. And 0.21 is > > still a while away. > > > > That said, if one imports a data-set (set of files, or directory) into a > > warehouse, isn't it safer to move that dataset into the warehouse itself > > rather than letting it sit outside. For one thing, the target of the > symlink > > might not be accessible to all hadoop slave nodes. > > > > -dhruba > > > > > > On Sat, Apr 18, 2009 at 7:41 PM, Edward Capriolo <[email protected] > >wrote: > > > >> I was looking at HADOOP-4044. It would be nice to be able to work on > >> files without moving them into the warehouse. Could a SerDe handle a > >> similar task? > >> > > > > Yes it would be safer to move it inside. > > The reason I would like to do this is in our deployment map reduce > programs are creating files outside of the warehouse. I do not want to > move them into the warehouse and I do not want to copy them. Being > able to 'symlink' would allow me to assemble virtual tables/ without > moving data changing the flow of an already existing process. > > So I am only looking to symlink to other files in the same filesystem. > On the extreme end a symlink to an external resource could be very > useful to but that is not what I was thinking of. >
