Hi Mike, This might help address your question:
http://storageconference.org/2010/Papers/MSST/Shvachko.pdf Regards, Anupam -----Original Message----- From: panamamike [mailto:[email protected]] Sent: Sunday, October 23, 2011 9:59 AM To: [email protected] Subject: Need help understanding Hadoop Architecture I'm new to Hadoop. I've read a few articles and presentations which are directed at explaining what Hadoop is, and how it works. Currently my understanding is Hadoop is an MPP system which leverages the use of large block size to quickly find data. In theory, I understand how a large block size along with an MPP architecture as well as using what I'm understanding to be a massive index scheme via mapreduce can be used to find data. What I don't understand is how ,after you identify the appropriate 64MB blocksize, do you find the data you're specifically after? Does this mean the CPU has to search the entire 64MB block for the data of interest? If so, how does Hadoop know what data from that block to retrieve? I'm assuming the block is probably composed of one or more files. If not, I'm assuming the user isn't look for the entire 64MB block rather a portion of it. Any help indicating documentation, books, articles on the subject would be much appreciated. Regards, Mike -- View this message in context: http://old.nabble.com/Need-help-understanding-Hadoop-Architecture-tp32705405p32705405.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
