On Sun, Apr 19, 2009 at 3:19 AM, Dhruba Borthakur <[email protected]> wrote: > HADOOP-4044 is scheduled to finally make it to 0.21 release. And 0.21 is > still a while away. > > That said, if one imports a data-set (set of files, or directory) into a > warehouse, isn't it safer to move that dataset into the warehouse itself > rather than letting it sit outside. For one thing, the target of the symlink > might not be accessible to all hadoop slave nodes. > > -dhruba > > > On Sat, Apr 18, 2009 at 7:41 PM, Edward Capriolo <[email protected]>wrote: > >> I was looking at HADOOP-4044. It would be nice to be able to work on >> files without moving them into the warehouse. Could a SerDe handle a >> similar task? >> >
Yes it would be safer to move it inside. The reason I would like to do this is in our deployment map reduce programs are creating files outside of the warehouse. I do not want to move them into the warehouse and I do not want to copy them. Being able to 'symlink' would allow me to assemble virtual tables/ without moving data changing the flow of an already existing process. So I am only looking to symlink to other files in the same filesystem. On the extreme end a symlink to an external resource could be very useful to but that is not what I was thinking of.
