Hi Dieter Very clear. The comic format works indeed quite well. > I never considered comics as a serious ("professional") way to get > something explained efficiently, > but this shows people should think twice before they start writing their > next documentation. >
Thanks! :) > one question though: if a DN has a corrupted block, why does the NN only > remove the bad DN from the block's list, and not the block from the DN list? > You are right. This needs to be fixed. > (also, does it really store the data in 2 separate tables? This looks to > me like 2 different views of the same data?) Actually its more than two tables... I have personally found the data structures rather contrived. In the org.apache.hadoop.hdfs.server.namenode package, information is kept in multiple places: - InodeFile, which has a list of blocks for a given file - FSNamesystem, has a map of block -> {inode, datanodes} - BlockInfo, which stores information in rather strange manner: /** * This array contains triplets of references. * For each i-th data-node the block belongs to * triplets[3*i] is the reference to the DatanodeDescriptor * and triplets[3*i+1] and triplets[3*i+2] are references * to the previous and the next blocks, respectively, in the * list of blocks belonging to this data-node. */ private Object[] triplets; > On Thu, 1 Dec 2011 08:53:31 +0100 > "Alexander C.H. Lorenz" <wget.n...@googlemail.com> wrote: > > > Hi all, > > > > very cool comic! > > > > Thanks, > > Alex > > > > On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh > > <manu.i...@gmail.com > > > wrote: > > > > > Hi, > > > > > > This is indeed a good way to explain, most of the improvement has > > > already been discussed. waiting for sequel of this comic. > > > > > > Regards, > > > Abhishek > > > > > > On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney > > > <mvarsh...@gmail.com > > > >wrote: > > > > > > > Hi Matthew > > > > > > > > I agree with both you and Prashant. The strip needs to be > > > > modified to explain that these can be default values that can be > > > > optionally > > > overridden > > > > (which I will fix in the next iteration). > > > > > > > > However, from the 'understanding concepts of HDFS' point of view, > > > > I still think that block size and replication factors are the > > > > real strengths of HDFS, and the learners must be exposed to them > > > > so that they get to see > > > how > > > > hdfs is significantly different from conventional file systems. > > > > > > > > On personal note: thanks for the first part of your message :) > > > > > > > > -Maneesh > > > > > > > > > > > > On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) < > > > > matthew.go...@monsanto.com> wrote: > > > > > > > > > Maneesh, > > > > > > > > > > Firstly, I love the comic :) > > > > > > > > > > Secondly, I am inclined to agree with Prashant on this latest > > > > > point. > > > > While > > > > > one code path could take us through the user defining command > > > > > line overrides (e.g. hadoop fs -D blah -put foo bar) I think it > > > > > might > > > confuse > > > > a > > > > > person new to Hadoop. The most common flow would be using admin > > > > determined > > > > > values from hdfs-site and the only thing that would need to > > > > > change is > > > > that > > > > > conversation happening between client / server and not user / > > > > > client. > > > > > > > > > > Matt > > > > > > > > > > -----Original Message----- > > > > > From: Prashant Kommireddi [mailto:prash1...@gmail.com] > > > > > Sent: Wednesday, November 30, 2011 3:28 PM > > > > > To: common-user@hadoop.apache.org > > > > > Subject: Re: HDFS Explained as Comics > > > > > > > > > > Sure, its just a case of how readers interpret it. > > > > > > > > > > 1. Client is required to specify block size and replication > > > > > factor > > > each > > > > > time > > > > > 2. Client does not need to worry about it since an admin has > > > > > set the properties in default configuration files > > > > > > > > > > A client could not be allowed to override the default configs > > > > > if they > > > are > > > > > set final (well there are ways to go around it as well as you > > > > > suggest > > > by > > > > > using create(....) :) > > > > > > > > > > The information is great and helpful. Just want to make sure a > > > > > beginner > > > > who > > > > > wants to write a "WordCount" in Mapreduce does not worry about > > > specifying > > > > > block size' and replication factor in his code. > > > > > > > > > > Thanks, > > > > > Prashant > > > > > > > > > > On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney > > > > > <mvarsh...@gmail.com > > > > > >wrote: > > > > > > > > > > > Hi Prashant > > > > > > > > > > > > Others may correct me if I am wrong here.. > > > > > > > > > > > > The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge > > > > > > of > > > block > > > > > size > > > > > > and replication factor. In the source code, I see the > > > > > > following in > > > the > > > > > > DFSClient constructor: > > > > > > > > > > > > defaultBlockSize = conf.getLong("dfs.block.size", > > > > DEFAULT_BLOCK_SIZE); > > > > > > > > > > > > defaultReplication = (short) > > > > > > conf.getInt("dfs.replication", 3); > > > > > > > > > > > > My understanding is that the client considers the following > > > > > > chain for > > > > the > > > > > > values: > > > > > > 1. Manual values (the long form constructor; when a user > > > > > > provides > > > these > > > > > > values) > > > > > > 2. Configuration file values (these are cluster level > > > > > > defaults: dfs.block.size and dfs.replication) > > > > > > 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3) > > > > > > > > > > > > Moreover, in the > > > > > > org.apache.hadoop.hdfs.protocool.ClientProtocol the > > > > API > > > > > to > > > > > > create a file is > > > > > > void create(...., short replication, long blocksize); > > > > > > > > > > > > I presume it means that the client already has knowledge of > > > > > > these > > > > values > > > > > > and passes them to the NameNode when creating a new file. > > > > > > > > > > > > Hope that helps. > > > > > > > > > > > > thanks > > > > > > -Maneesh > > > > > > > > > > > > On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi < > > > > > prash1...@gmail.com > > > > > > >wrote: > > > > > > > > > > > > > Thanks Maneesh. > > > > > > > > > > > > > > Quick question, does a client really need to know Block > > > > > > > size and replication factor - A lot of times client has no > > > > > > > control over > > > these > > > > > (set > > > > > > > at cluster level) > > > > > > > > > > > > > > -Prashant Kommireddi > > > > > > > > > > > > > > On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges < > > > > dejan.men...@gmail.com > > > > > > > >wrote: > > > > > > > > > > > > > > > Hi Maneesh, > > > > > > > > > > > > > > > > Thanks a lot for this! Just distributed it over the team > > > > > > > > and > > > > comments > > > > > > are > > > > > > > > great :) > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Dejan > > > > > > > > > > > > > > > > On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney < > > > > > mvarsh...@gmail.com > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > For your reading pleasure! > > > > > > > > > > > > > > > > > > PDF 3.3MB uploaded at (the mailing list has a cap of 1MB > > > > > > attachments): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1 > > > > > > > > > > > > > > > > > > > > > > > > > > > Appreciate if you can spare some time to peruse this > > > > > > > > > little > > > > > > experiment > > > > > > > of > > > > > > > > > mine to use Comics as a medium to explain computer > > > > > > > > > science > > > > topics. > > > > > > This > > > > > > > > > particular issue explains the protocols and internals > > > > > > > > > of HDFS. > > > > > > > > > > > > > > > > > > I am eager to hear your opinions on the usefulness of > > > > > > > > > this > > > visual > > > > > > > medium > > > > > > > > to > > > > > > > > > teach complex protocols and algorithms. > > > > > > > > > > > > > > > > > > [My personal motivations: I have always found text > > > > > > > > > descriptions > > > > to > > > > > be > > > > > > > too > > > > > > > > > verbose as lot of effort is spent putting the concepts > > > > > > > > > in > > > proper > > > > > > > > time-space > > > > > > > > > context (which can be easily avoided in a visual > > > > > > > > > medium); > > > > sequence > > > > > > > > diagrams > > > > > > > > > are unwieldy for non-trivial protocols, and they do not > > > > > > > > > explain > > > > > > > concepts; > > > > > > > > > and finally, animations/videos happen "too fast" and do > > > > > > > > > not > > > offer > > > > > > > > > self-paced learning experience.] > > > > > > > > > > > > > > > > > > All forms of criticisms, comments (and encouragements) > > > > > > > > > welcome > > > :) > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > Maneesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This e-mail message may contain privileged and/or confidential > > > > > information, and is intended to be received only by persons > > > > > entitled to receive such information. If you have received this > > > > > e-mail in error, please notify the sender immediately. Please > > > > > delete it and all attachments from any servers, hard drives or > > > > > any other media. Other use of this e-mail by you is strictly > > > > > prohibited. > > > > > > > > > > All e-mails and attachments sent and received are subject to > > > monitoring, > > > > > reading and archival by Monsanto, including its > > > > > subsidiaries. The recipient of this e-mail is solely > > > > > responsible for checking for the presence of "Viruses" or other > > > > > "Malware". Monsanto, along with its subsidiaries, accepts no > > > > > liability for any > > > > damage > > > > > caused by any such code transmitted by or accompanying > > > > > this e-mail or any attachment. > > > > > > > > > > > > > > > The information contained in this email may be subject to the > > > > > export control laws and regulations of the United States, > > > > > potentially including but not limited to the Export > > > > > Administration Regulations > > > (EAR) > > > > > and sanctions regulations issued by the U.S. Department of > > > > > Treasury, Office of Foreign Asset Controls (OFAC). As a > > > > > recipient of > > > > this > > > > > information you are obligated to comply with all > > > > > applicable U.S. export laws and regulations. > > > > > > > > > > > > > > > > > > > > > > > > >