Few queries regarding the way data is loaded into HDFS. -Is it a common practice to load the data into HDFS only through the master node ? We are able to copy only around 35 logs (64K each) per minute in a 2 slave configuration.
-We are concerned about time it would take to update filenames and block maps in the master node when data is loaded from few/all the slave nodes. Can anyone let me know how long generally it takes for this update to happen. And one more question, what if the node crashes soon after the data is copied into one it. How is data consistency maintained here ? Thanks in advance, Venkates P B
