Hello,
I was trying to setup NDFS on my notebook today and had some problems:
1) Documentation on http://wiki.apache.org/nutch/NutchDistributedFileSystem is a good start but it is a bit outdated (examples use old package names, properties in config files are not mentioned and some tools take different number or format of command line parameters). I can update it a bit if noone objects - but I want to make sure it would be enough to simply edit wiki page content. So is any other activity required to have it updated?
2) NUTCH-46 - on Windows platform there are problems related to handling of file separators - nutch uses java.io.File object and sometimes creates paths by appending strings with Unix path convention. I will try to make it work on both platforms (I am using Windows for development as a company standard development machine so it is a problem for me). I will post a patch for it or ask further questions if required changes would not be trivial.
3) There are some minor issues with NDFS for a beginner I would like to
provide a patch for.
Eg.
a) not all command line options are displayed when printing usage information for TestClient class
b) TestClient is not always checking if execution of the command failed .
c) It is not easy to start two instances of DataNode on one machine (for tests) - I would like to add command line options for starting data node in given directory on given port (they would have precedence over
config entries).
I am sure I will have more comments as I will make progress in discovering NDFS secrets :).
So I will work on it in next few days and I will provide a patches for review.
Regards,
Piotr
