Re: HDFS architecture based on GFS?

Matei Zaharia Sun, 15 Feb 2009 14:36:05 -0800

I think it's safe to assume that Hadoop works like MapReduce/GFS at the
level described in those papers. In particular, in HDFS, there is a master
node containing metadata and a number of slave nodes (datanodes) containing
blocks, as in GFS. Clients start by talking to the master to list
directories, etc. When they want to read a region of some file, they tell
the master the filename and offset, and they receive a list of block
locations (datanodes). They then contact the individual datanodes to read
the blocks. When clients write a file, they first obtain a new block ID and
list of nodes to write it to from the master, then contact the datanodes to
write it (actually, the datanodes pipeline the write as in GFS) and report
when the write is complete. HDFS actually has some security mechanisms built
in, authenticating users based on their Unix ID and providing Unix-like file
permissions. I don't know much about how these are implemented, but they
would be a good place to start looking.
On Sun, Feb 15, 2009 at 1:36 PM, Amandeep Khurana <ama...@gmail.com> wrote:


> Thanks Matie
>
> I had gone through the architecture document online. I am currently working
> on a project towards Security in Hadoop. I do know how the data moves
> around
> in the GFS but wasnt sure how much of that does HDFS follow and how
> different it is from GFS. Can you throw some light on that?
>
> Security would also involve the Map Reduce jobs following the same
> protocols. Thats why the question about how does the Hadoop framework
> integrate with the HDFS, and how different is it from Map Reduce and GFS.
> The GFS and Map Reduce papers give a good information on how those systems
> are designed but there is nothing that concrete for Hadoop that I have been
> able to find.
>
> Amandeep
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Sun, Feb 15, 2009 at 12:07 PM, Matei Zaharia <ma...@cloudera.com>
> wrote:
>
> > Hi Amandeep,
> > Hadoop is definitely inspired by MapReduce/GFS and aims to provide those
> > capabilities as an open-source project. HDFS is similar to GFS (large
> > blocks, replication, etc); some notable things missing are read-write
> > support in the middle of a file (unlikely to be provided because few
> Hadoop
> > applications require it) and multiple appenders (the record append
> > operation). You can read about HDFS architecture at
> > http://hadoop.apache.org/core/docs/current/hdfs_design.html. The
> MapReduce
> > part of Hadoop interacts with HDFS in the same way that Google's
> MapReduce
> > interacts with GFS (shipping computation to the data), although Hadoop
> > MapReduce also supports running over other distributed filesystems.
> >
> > Matei
> >
> > On Sun, Feb 15, 2009 at 11:57 AM, Amandeep Khurana <ama...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > Is the HDFS architecture completely based on the Google Filesystem? If
> it
> > > isnt, what are the differences between the two?
> > >
> > > Secondly, is the coupling between Hadoop and HDFS same as how it is
> > between
> > > the Google's version of Map Reduce and GFS?
> > >
> > > Amandeep
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> >
>

Re: HDFS architecture based on GFS?

Reply via email to