Hey Rasit, I'm not sure I fully understand your description of the problem, but you might want to check out the JIRA ticket for making the replica placement algorithms in HDFS pluggable (https://issues.apache.org/jira/browse/HADOOP-3799) and add your use case there.
Regards, Jeff On Tue, Feb 10, 2009 at 5:05 AM, Rasit OZDAS <rasitoz...@gmail.com> wrote: > > Hi, > > We have thousands of files, each dedicated to a user. (Each user has > access to other users' files, but they do this not very often.) > Each user runs map-reduce jobs on the cluster. > So we should seperate his/her files equally across the cluster, > so that every machine can take part in the process (assuming he/she is > the only user running jobs). > For this we should initially copy files to specified nodes: > User A : first file : Node 1, second file: Node 2, .. etc. > User B : first file : Node 1, second file: Node 2, .. etc. > > I know, hadoop create also replicas, but in our solution at least one > file will be in the right place > (or we're willing to control other replicas too). > > Rebalancing is also not a problem, assuming it uses the information > about how much a computer is in use. > It even helps for a better organization of files. > > How can we copy files to specified nodes? > Or do you have a better solution for us? > > I couldn't find a solution to this, probably such an option doesn't exist. > But I wanted to take an expert's opinion about this. > > Thanks in advance.. > Rasit