Re: HDFS architecture based on GFS?

Amandeep Khurana Sun, 15 Feb 2009 15:21:25 -0800

Thanks Matei. If the basic architecture is similar to the Google stuff, I
can safely just work on the project using the information from the papers.


I am aware of the 4487 jira and the current status of the permissions
mechanism. I had a look at them earlier.

Cheers
Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Sun, Feb 15, 2009 at 2:40 PM, Matei Zaharia <ma...@cloudera.com> wrote:

> Forgot to add, this JIRA details the latest security features that are
> being
> worked on in Hadoop trunk:
> https://issues.apache.org/jira/browse/HADOOP-4487.
> This document describes the current status and limitations of the
> permissions mechanism:
> http://hadoop.apache.org/core/docs/current/hdfs_permissions_guide.html.
>
> On Sun, Feb 15, 2009 at 2:35 PM, Matei Zaharia <ma...@cloudera.com> wrote:
>
> > I think it's safe to assume that Hadoop works like MapReduce/GFS at the
> > level described in those papers. In particular, in HDFS, there is a
> master
> > node containing metadata and a number of slave nodes (datanodes)
> containing
> > blocks, as in GFS. Clients start by talking to the master to list
> > directories, etc. When they want to read a region of some file, they tell
> > the master the filename and offset, and they receive a list of block
> > locations (datanodes). They then contact the individual datanodes to read
> > the blocks. When clients write a file, they first obtain a new block ID
> and
> > list of nodes to write it to from the master, then contact the datanodes
> to
> > write it (actually, the datanodes pipeline the write as in GFS) and
> report
> > when the write is complete. HDFS actually has some security mechanisms
> built
> > in, authenticating users based on their Unix ID and providing Unix-like
> file
> > permissions. I don't know much about how these are implemented, but they
> > would be a good place to start looking.
> >
> > On Sun, Feb 15, 2009 at 1:36 PM, Amandeep Khurana <ama...@gmail.com
> >wrote:
> >
> >> Thanks Matie
> >>
> >> I had gone through the architecture document online. I am currently
> >> working
> >> on a project towards Security in Hadoop. I do know how the data moves
> >> around
> >> in the GFS but wasnt sure how much of that does HDFS follow and how
> >> different it is from GFS. Can you throw some light on that?
> >>
> >> Security would also involve the Map Reduce jobs following the same
> >> protocols. Thats why the question about how does the Hadoop framework
> >> integrate with the HDFS, and how different is it from Map Reduce and
> GFS.
> >> The GFS and Map Reduce papers give a good information on how those
> systems
> >> are designed but there is nothing that concrete for Hadoop that I have
> >> been
> >> able to find.
> >>
> >> Amandeep
> >>
> >>
> >> Amandeep Khurana
> >> Computer Science Graduate Student
> >> University of California, Santa Cruz
> >>
> >>
> >> On Sun, Feb 15, 2009 at 12:07 PM, Matei Zaharia <ma...@cloudera.com>
> >> wrote:
> >>
> >> > Hi Amandeep,
> >> > Hadoop is definitely inspired by MapReduce/GFS and aims to provide
> those
> >> > capabilities as an open-source project. HDFS is similar to GFS (large
> >> > blocks, replication, etc); some notable things missing are read-write
> >> > support in the middle of a file (unlikely to be provided because few
> >> Hadoop
> >> > applications require it) and multiple appenders (the record append
> >> > operation). You can read about HDFS architecture at
> >> > http://hadoop.apache.org/core/docs/current/hdfs_design.html. The
> >> MapReduce
> >> > part of Hadoop interacts with HDFS in the same way that Google's
> >> MapReduce
> >> > interacts with GFS (shipping computation to the data), although Hadoop
> >> > MapReduce also supports running over other distributed filesystems.
> >> >
> >> > Matei
> >> >
> >> > On Sun, Feb 15, 2009 at 11:57 AM, Amandeep Khurana <ama...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi
> >> > >
> >> > > Is the HDFS architecture completely based on the Google Filesystem?
> If
> >> it
> >> > > isnt, what are the differences between the two?
> >> > >
> >> > > Secondly, is the coupling between Hadoop and HDFS same as how it is
> >> > between
> >> > > the Google's version of Map Reduce and GFS?
> >> > >
> >> > > Amandeep
> >> > >
> >> > >
> >> > > Amandeep Khurana
> >> > > Computer Science Graduate Student
> >> > > University of California, Santa Cruz
> >> > >
> >> >
> >>
> >
> >
>

Re: HDFS architecture based on GFS?

Reply via email to