Jay Pound wrote:
1.) we need to split up chunks of data into sub-folders as not to run the
filesystem out of  its physical limitations of concurrent files in a single
directory, like the way squid splits up its data into directories.

I agree. I am currently using reiser with NDFS so this is not a priority, but long-term it should be fixed. Please file a bug report, and, ideally, contribute a patch.

2.)when a datanode is set to store data on a nfs share / samba share [...]

That is not a recommended configuration.

A datanode should reasonably handle disk failures. Developing and debugging this may take time, however. I'm not yet sure how disk failures appear to a JVM. Things are currently written so that if an exception is thrown during disk i/o then the datanode should take itself offline, initiating replication of its data. We'll see if that's sufficient.

3.)we need to set a limit on how much of the filesystem can be used by ndfs,
or a max # of 32mb chunks to be stored, when a single machine runs out of
space the same thing happens as in #2 ndfs hangs waiting to write data to
that particular datanode not transmitting data to the other datanodes

The max storage per datanode was configurable, but we found that to be difficult, as it required separate configuration per datanode if datanodes have different devices. So now all space on the device is assumed to be available to NDFS. Probably making this optionally configurable would be better. Please file a bug report, and, ideally, contribute a patch.

Doug

Reply via email to