To all I have been following this board for the past few weeks, and the information has been great - so I appreciate the amount of sharing that has been going on
I am in the "newbie" category here - so there is something I need some guidance on. I think I have a basic understanding of HDFS and how data is loaded into HDFS. What I haven't figured out just yet - how do you organize the "data" ? I know how you do it with a relational database - but I have read that Yahoo has installations with more than 60 Million files. At the end of the day, you need SOME idea of what you are accessing, don't you ? Anything that talks to the organization of data in HDFS and the approach of querying against it would be very helpful Thanks in advance ! On Mon, Oct 24, 2011 at 12:26 PM, Anupam Seth <[email protected]> wrote: > Hi Mike, > > This might help address your question: > > http://storageconference.org/2010/Papers/MSST/Shvachko.pdf > > Regards, > Anupam > > -----Original Message----- > From: panamamike [mailto:[email protected]] > Sent: Sunday, October 23, 2011 9:59 AM > To: [email protected] > Subject: Need help understanding Hadoop Architecture > > > I'm new to Hadoop. I've read a few articles and presentations which are > directed at explaining what Hadoop is, and how it works. Currently my > understanding is Hadoop is an MPP system which leverages the use of large > block size to quickly find data. In theory, I understand how a large block > size along with an MPP architecture as well as using what I'm understanding > to be a massive index scheme via mapreduce can be used to find data. > > What I don't understand is how ,after you identify the appropriate 64MB > blocksize, do you find the data you're specifically after? Does this mean > the CPU has to search the entire 64MB block for the data of interest? If > so, how does Hadoop know what data from that block to retrieve? > > I'm assuming the block is probably composed of one or more files. If not, > I'm assuming the user isn't look for the entire 64MB block rather a portion > of it. > > Any help indicating documentation, books, articles on the subject would be > much appreciated. > > Regards, > > Mike > -- > View this message in context: > http://old.nabble.com/Need-help-understanding-Hadoop-Architecture-tp32705405p32705405.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > >
