Hi, I have a doubt about HDFS which may be a very trivial thing but I am not able to understand it.
Since hdfs keeps the files in block of 64/128 MB how does HDFS splits files? The problem which I see is that suppose I have a long string in my input file as: 672364,423746273,4234234,2,342,34,2,34,234,2,34,234,2,342,342 This is to be processed in one map call. But because of blocks a part of this line is in one block and next in another. Block1: -- - - this block goes to one mapper process - - 672364,423746273,4234 <end of block1> Block2: 234,2,342,34,2,34,234,2,34,234,2,342,342 - - - this block goes to another mapper process How HDFS avoids this scenario? Thanks and Regards Utkarsh Gupta **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS***