I was just wondering if this was normal behavior. I assume it might be. I kept copying a 1.6 gig file into the hdfs, and it kept making node 1 bigger and bigger until it finally went over and started using node 2. I was quite hoping it would write out to both nodes.
First Question: Is that normal behavior? (I know I can keep rebalancing but that can get tedious being manual and all). Second Question: We have huge files sent to us every night over ftp and most likely, we may mount hdfs from linux so as the file comes in, it would be written to hdfs. Is there a way to configure hdfs to be writing some of the file to one node and some of the file to another node? Third: If I can do the 2nd question, I am hoping map/reduce can cope with splitting the file(I copied and modified LineRecordReader to suit our needs...key is not line number of file and is more just generated)....I guess ideally, I am hoping for more parallelization here so these big files are processed on multiple nodes and my map/reduce is written so the map jobs should be running close to where the data is being written too(and processed as well after the write), but of course far from the input files which are most likely not on the same nodes as where the data will be stored. Thanks for any input on this, Dean This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.