no balancing while using filesystem in hadoop?

Hiller, Dean (Contractor) Wed, 29 Dec 2010 07:07:33 -0800

I was just wondering if this was normal behavior.  I assume it might be.
I kept copying a 1.6 gig file into the hdfs, and it kept making node 1
bigger and bigger until it finally went over and started using node 2.
I was quite hoping it would write out to both nodes.


 

First Question: Is that normal behavior?  (I know I can keep rebalancing
but that can get tedious being manual and all).

 

Second Question: We have huge files sent to us every night over ftp and
most likely, we may mount hdfs from linux so as the file comes in, it
would be written to hdfs.  Is there a way to configure hdfs to be
writing some of the file to one node and some of the file to another
node?  

 

Third: If I can do the 2nd question, I am hoping map/reduce can cope
with splitting the file(I copied and modified LineRecordReader to suit
our needs...key is not line number of file and is more just
generated)....I guess ideally, I am hoping for more parallelization here
so these big files are processed on multiple nodes and my map/reduce is
written so the map jobs should be running close to where the data is
being written too(and processed as well after the write), but of course
far from the input files which are most likely not on the same nodes as
where the data will be stored.

 

Thanks for any input on this,

Dean


This message and any attachments are intended only for the use of the addressee 
and
may contain information that is privileged and confidential. If the reader of 
the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

no balancing while using filesystem in hadoop?

Reply via email to