I think it's safe to assume that Hadoop works like MapReduce/GFS at the level described in those papers. In particular, in HDFS, there is a master node containing metadata and a number of slave nodes (datanodes) containing blocks, as in GFS. Clients start by talking to the master to list directories, etc. When they want to read a region of some file, they tell the master the filename and offset, and they receive a list of block locations (datanodes). They then contact the individual datanodes to read the blocks. When clients write a file, they first obtain a new block ID and list of nodes to write it to from the master, then contact the datanodes to write it (actually, the datanodes pipeline the write as in GFS) and report when the write is complete. HDFS actually has some security mechanisms built in, authenticating users based on their Unix ID and providing Unix-like file permissions. I don't know much about how these are implemented, but they would be a good place to start looking. On Sun, Feb 15, 2009 at 1:36 PM, Amandeep Khurana <ama...@gmail.com> wrote:
> Thanks Matie > > I had gone through the architecture document online. I am currently working > on a project towards Security in Hadoop. I do know how the data moves > around > in the GFS but wasnt sure how much of that does HDFS follow and how > different it is from GFS. Can you throw some light on that? > > Security would also involve the Map Reduce jobs following the same > protocols. Thats why the question about how does the Hadoop framework > integrate with the HDFS, and how different is it from Map Reduce and GFS. > The GFS and Map Reduce papers give a good information on how those systems > are designed but there is nothing that concrete for Hadoop that I have been > able to find. > > Amandeep > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Sun, Feb 15, 2009 at 12:07 PM, Matei Zaharia <ma...@cloudera.com> > wrote: > > > Hi Amandeep, > > Hadoop is definitely inspired by MapReduce/GFS and aims to provide those > > capabilities as an open-source project. HDFS is similar to GFS (large > > blocks, replication, etc); some notable things missing are read-write > > support in the middle of a file (unlikely to be provided because few > Hadoop > > applications require it) and multiple appenders (the record append > > operation). You can read about HDFS architecture at > > http://hadoop.apache.org/core/docs/current/hdfs_design.html. The > MapReduce > > part of Hadoop interacts with HDFS in the same way that Google's > MapReduce > > interacts with GFS (shipping computation to the data), although Hadoop > > MapReduce also supports running over other distributed filesystems. > > > > Matei > > > > On Sun, Feb 15, 2009 at 11:57 AM, Amandeep Khurana <ama...@gmail.com> > > wrote: > > > > > Hi > > > > > > Is the HDFS architecture completely based on the Google Filesystem? If > it > > > isnt, what are the differences between the two? > > > > > > Secondly, is the coupling between Hadoop and HDFS same as how it is > > between > > > the Google's version of Map Reduce and GFS? > > > > > > Amandeep > > > > > > > > > Amandeep Khurana > > > Computer Science Graduate Student > > > University of California, Santa Cruz > > > > > >