Hi,

I have a doubt about HDFS which may be a very trivial thing but I am not able 
to understand it.

Since hdfs keeps the files in block of 64/128 MB how does HDFS splits files?
The problem which I see is that suppose I have a long string in my input file 
as:

672364,423746273,4234234,2,342,34,2,34,234,2,34,234,2,342,342

This is to be processed in one map call. But because of blocks a part of this 
line is in one block and next in another.

Block1:
--
-
-                                                                              
this block goes to one mapper process
-
-
672364,423746273,4234
<end of block1>

Block2:
234,2,342,34,2,34,234,2,34,234,2,342,342
-
-
-                                                                              
this block goes to another mapper process


How HDFS avoids this scenario?

Thanks and Regards
Utkarsh Gupta



**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

Reply via email to