Hello. I am currently developing a patch so that Nutch can be used as a job jar in a hadoop .17 framework. The task turned out to not be that complicated, just involving updating certain deprecated methods that were removed in hadoop .17 and parameterizing certain methods and classes. So the diff is not that long. If you could give me some advice/hints on the following it would be much appreciated since I would then be able to go and finish the task and submit it to JIRA as a patch:

Basically the build compiles but still breaks two unit tests which we can not seem to find the cause of. They are:

   * TestCrawlDbMerger.java
   * TestDeleteDuplicates.java

I have tracked down the bug in TestCrawlDbMerger to a difference in fetchTimes in Url10 and Url20. The resultant is continually 10 seconds behind the expected.

I have not had as much of an opportunity to examine why TestDeleteDuplicates fails.

The diff of my changes are at this address <http://pastie.caboo.se/204167>.

Thank you so much in advance,

Michael

Reply via email to