Hi nithin: The fuse mount is what allows the filesystem to access distributed files in gluster: that is, GlusterFS has its own fuse mount ... And GlusterFileSystem wraps that in hadoop FileSystem semantics.
Meanwhile, The mapreduce jobs are invoked using on custom core-site and mapred-site XML nodes which specify GlusterFileSystem as the dfs. On Feb 22, 2013, at 3:17 AM, Nikhil Agarwal <[email protected]> wrote: > Hi All, > > > > Thanks a lot for taking out your time to answer my question. > > > > I am trying to implement a file system in hadoop under irg.apache.hadoop.fs > package something similar to KFS, glusterfs, etc. I wanted to know is that in > README.txt of glusterfs it is mentioned : > > > > >> # ./bin/start-mapred.sh > If the map/reduce job/task trackers are up, all I/O will be done to > GlusterFS. > > > > So, suppose my input files are scattered in different nodes(glusterfs > servers), how do I(hadoop client having glusterfs plugged in) issue a > Mapreduce command? > > Moreover, after issuing a Mapreduce command would my hadoop client fetch all > the data from different servers to my local machine and then do a Mapreduce > or would it start the TaskTracker daemons on the machine(s) where the input > file(s) are located and perform a Mapreduce there? > > Please rectify me if I am wrong but I suppose that the location of input > files top Mapreduce is being returned by the function getFileBlockLocations > (FileStatus file, long start, long len). > > > > Thank you very much for your time and helping me out. > > > > Regards, > > Nikhil > > > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
