Jay Pound wrote:
1.) we need to split up chunks of data into sub-folders as not to run the
filesystem out of its physical limitations of concurrent files in a single
directory, like the way squid splits up its data into directories.
I agree. I am currently using reiser with NDFS so this is not a
priority, but long-term it should be fixed. Please file a bug report,
and, ideally, contribute a patch.
2.)when a datanode is set to store data on a nfs share / samba share [...]
That is not a recommended configuration.
A datanode should reasonably handle disk failures. Developing and
debugging this may take time, however. I'm not yet sure how disk
failures appear to a JVM. Things are currently written so that if an
exception is thrown during disk i/o then the datanode should take itself
offline, initiating replication of its data. We'll see if that's
sufficient.
3.)we need to set a limit on how much of the filesystem can be used by ndfs,
or a max # of 32mb chunks to be stored, when a single machine runs out of
space the same thing happens as in #2 ndfs hangs waiting to write data to
that particular datanode not transmitting data to the other datanodes
The max storage per datanode was configurable, but we found that to be
difficult, as it required separate configuration per datanode if
datanodes have different devices. So now all space on the device is
assumed to be available to NDFS. Probably making this optionally
configurable would be better. Please file a bug report, and, ideally,
contribute a patch.
Doug