Hi Dieter

Very clear.  The comic format works indeed quite well.
> I never considered comics as a serious ("professional") way to get
> something explained efficiently,
> but this shows people should think twice before they start writing their
> next documentation.
>

Thanks! :)


> one question though: if a DN has a corrupted block, why does the NN only
> remove the bad DN from the block's list, and not the block from the DN list?
>

You are right. This needs to be fixed.


> (also, does it really store the data in 2 separate tables?  This looks to
> me like 2 different views of the same data?)


Actually its more than two tables... I have personally found the data
structures rather contrived.

In the org.apache.hadoop.hdfs.server.namenode package, information is kept
in multiple places:
- InodeFile, which has a list of blocks for a given file
- FSNamesystem, has a map of block -> {inode, datanodes}
- BlockInfo, which stores information in rather strange manner:

    /**

     * This array contains triplets of references.

     * For each i-th data-node the block belongs to

     * triplets[3*i] is the reference to the DatanodeDescriptor

     * and triplets[3*i+1] and triplets[3*i+2] are references

     * to the previous and the next blocks, respectively, in the

     * list of blocks belonging to this data-node.

     */

private Object[] triplets;





> On Thu, 1 Dec 2011 08:53:31 +0100
> "Alexander C.H. Lorenz" <wget.n...@googlemail.com> wrote:
>
> > Hi all,
> >
> > very cool comic!
> >
> > Thanks,
> >  Alex
> >
> > On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
> > <manu.i...@gmail.com
> > > wrote:
> >
> > > Hi,
> > >
> > > This is indeed a good way to explain, most of the improvement has
> > > already been discussed. waiting for sequel of this comic.
> > >
> > > Regards,
> > > Abhishek
> > >
> > > On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
> > > <mvarsh...@gmail.com
> > > >wrote:
> > >
> > > > Hi Matthew
> > > >
> > > > I agree with both you and Prashant. The strip needs to be
> > > > modified to explain that these can be default values that can be
> > > > optionally
> > > overridden
> > > > (which I will fix in the next iteration).
> > > >
> > > > However, from the 'understanding concepts of HDFS' point of view,
> > > > I still think that block size and replication factors are the
> > > > real strengths of HDFS, and the learners must be exposed to them
> > > > so that they get to see
> > > how
> > > > hdfs is significantly different from conventional file systems.
> > > >
> > > > On personal note: thanks for the first part of your message :)
> > > >
> > > > -Maneesh
> > > >
> > > >
> > > > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) <
> > > > matthew.go...@monsanto.com> wrote:
> > > >
> > > > > Maneesh,
> > > > >
> > > > > Firstly, I love the comic :)
> > > > >
> > > > > Secondly, I am inclined to agree with Prashant on this latest
> > > > > point.
> > > > While
> > > > > one code path could take us through the user defining command
> > > > > line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
> > > > > might
> > > confuse
> > > > a
> > > > > person new to Hadoop. The most common flow would be using admin
> > > > determined
> > > > > values from hdfs-site and the only thing that would need to
> > > > > change is
> > > > that
> > > > > conversation happening between client / server and not user /
> > > > > client.
> > > > >
> > > > > Matt
> > > > >
> > > > > -----Original Message-----
> > > > > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > > > > Sent: Wednesday, November 30, 2011 3:28 PM
> > > > > To: common-user@hadoop.apache.org
> > > > > Subject: Re: HDFS Explained as Comics
> > > > >
> > > > > Sure, its just a case of how readers interpret it.
> > > > >
> > > > >   1. Client is required to specify block size and replication
> > > > > factor
> > > each
> > > > >   time
> > > > >   2. Client does not need to worry about it since an admin has
> > > > > set the properties in default configuration files
> > > > >
> > > > > A client could not be allowed to override the default configs
> > > > > if they
> > > are
> > > > > set final (well there are ways to go around it as well as you
> > > > > suggest
> > > by
> > > > > using create(....) :)
> > > > >
> > > > > The information is great and helpful. Just want to make sure a
> > > > > beginner
> > > > who
> > > > > wants to write a "WordCount" in Mapreduce does not worry about
> > > specifying
> > > > > block size' and replication factor in his code.
> > > > >
> > > > > Thanks,
> > > > > Prashant
> > > > >
> > > > > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
> > > > > <mvarsh...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Prashant
> > > > > >
> > > > > > Others may correct me if I am wrong here..
> > > > > >
> > > > > > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge
> > > > > > of
> > > block
> > > > > size
> > > > > > and replication factor. In the source code, I see the
> > > > > > following in
> > > the
> > > > > > DFSClient constructor:
> > > > > >
> > > > > >    defaultBlockSize = conf.getLong("dfs.block.size",
> > > > DEFAULT_BLOCK_SIZE);
> > > > > >
> > > > > >    defaultReplication = (short)
> > > > > > conf.getInt("dfs.replication", 3);
> > > > > >
> > > > > > My understanding is that the client considers the following
> > > > > > chain for
> > > > the
> > > > > > values:
> > > > > > 1. Manual values (the long form constructor; when a user
> > > > > > provides
> > > these
> > > > > > values)
> > > > > > 2. Configuration file values (these are cluster level
> > > > > > defaults: dfs.block.size and dfs.replication)
> > > > > > 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
> > > > > >
> > > > > > Moreover, in the
> > > > > > org.apache.hadoop.hdfs.protocool.ClientProtocol the
> > > > API
> > > > > to
> > > > > > create a file is
> > > > > > void create(...., short replication, long blocksize);
> > > > > >
> > > > > > I presume it means that the client already has knowledge of
> > > > > > these
> > > > values
> > > > > > and passes them to the NameNode when creating a new file.
> > > > > >
> > > > > > Hope that helps.
> > > > > >
> > > > > > thanks
> > > > > > -Maneesh
> > > > > >
> > > > > > On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi <
> > > > > prash1...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Thanks Maneesh.
> > > > > > >
> > > > > > > Quick question, does a client really need to know Block
> > > > > > > size and replication factor - A lot of times client has no
> > > > > > > control over
> > > these
> > > > > (set
> > > > > > > at cluster level)
> > > > > > >
> > > > > > > -Prashant Kommireddi
> > > > > > >
> > > > > > > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges <
> > > > dejan.men...@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > Hi Maneesh,
> > > > > > > >
> > > > > > > > Thanks a lot for this! Just distributed it over the team
> > > > > > > > and
> > > > comments
> > > > > > are
> > > > > > > > great :)
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Dejan
> > > > > > > >
> > > > > > > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney <
> > > > > mvarsh...@gmail.com
> > > > > > > > >wrote:
> > > > > > > >
> > > > > > > > > For your reading pleasure!
> > > > > > > > >
> > > > > > > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
> > > > > > attachments):
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Appreciate if you can spare some time to peruse this
> > > > > > > > > little
> > > > > > experiment
> > > > > > > of
> > > > > > > > > mine to use Comics as a medium to explain computer
> > > > > > > > > science
> > > > topics.
> > > > > > This
> > > > > > > > > particular issue explains the protocols and internals
> > > > > > > > > of HDFS.
> > > > > > > > >
> > > > > > > > > I am eager to hear your opinions on the usefulness of
> > > > > > > > > this
> > > visual
> > > > > > > medium
> > > > > > > > to
> > > > > > > > > teach complex protocols and algorithms.
> > > > > > > > >
> > > > > > > > > [My personal motivations: I have always found text
> > > > > > > > > descriptions
> > > > to
> > > > > be
> > > > > > > too
> > > > > > > > > verbose as lot of effort is spent putting the concepts
> > > > > > > > > in
> > > proper
> > > > > > > > time-space
> > > > > > > > > context (which can be easily avoided in a visual
> > > > > > > > > medium);
> > > > sequence
> > > > > > > > diagrams
> > > > > > > > > are unwieldy for non-trivial protocols, and they do not
> > > > > > > > > explain
> > > > > > > concepts;
> > > > > > > > > and finally, animations/videos happen "too fast" and do
> > > > > > > > > not
> > > offer
> > > > > > > > > self-paced learning experience.]
> > > > > > > > >
> > > > > > > > > All forms of criticisms, comments (and encouragements)
> > > > > > > > > welcome
> > > :)
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Maneesh
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > This e-mail message may contain privileged and/or confidential
> > > > > information, and is intended to be received only by persons
> > > > > entitled to receive such information. If you have received this
> > > > > e-mail in error, please notify the sender immediately. Please
> > > > > delete it and all attachments from any servers, hard drives or
> > > > > any other media. Other use of this e-mail by you is strictly
> > > > > prohibited.
> > > > >
> > > > > All e-mails and attachments sent and received are subject to
> > > monitoring,
> > > > > reading and archival by Monsanto, including its
> > > > > subsidiaries. The recipient of this e-mail is solely
> > > > > responsible for checking for the presence of "Viruses" or other
> > > > > "Malware". Monsanto, along with its subsidiaries, accepts no
> > > > > liability for any
> > > > damage
> > > > > caused by any such code transmitted by or accompanying
> > > > > this e-mail or any attachment.
> > > > >
> > > > >
> > > > > The information contained in this email may be subject to the
> > > > > export control laws and regulations of the United States,
> > > > > potentially including but not limited to the Export
> > > > > Administration Regulations
> > > (EAR)
> > > > > and sanctions regulations issued by the U.S. Department of
> > > > > Treasury, Office of Foreign Asset Controls (OFAC).  As a
> > > > > recipient of
> > > > this
> > > > > information you are obligated to comply with all
> > > > > applicable U.S. export laws and regulations.
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
>
>

Reply via email to