[
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907297#action_12907297
]
Andrzej Bialecki commented on NUTCH-893:
-----------------------------------------
Very good catch - yes, the test now passes for me too. This is actually good
news for Gora :) I'll continue digging regarding NUTCH-879 ... don't hesitate
if you have any ideas how to solve that. I suspect we may be losing keys in
Generator or Fetcher, due to partitioning collisions but this hypothesis needs
to be tested.
> DataStore.put() silently loses records when executed from multiple processes
> ----------------------------------------------------------------------------
>
> Key: NUTCH-893
> URL: https://issues.apache.org/jira/browse/NUTCH-893
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.0
> Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK
> 1.6
> Reporter: Andrzej Bialecki
> Priority: Blocker
> Fix For: 2.0
>
> Attachments: NUTCH-893.patch, NUTCH-893_v2.patch
>
>
> In order to debug the issue described in NUTCH-879 I created a test to
> simulate multiple clients appending to webtable (please see the patch), which
> is the situation that we have in distributed map-reduce jobs.
> There are two tests there: one that uses multiple threads within the same
> JVM, and another that uses single thread in multiple JVMs. Each test first
> clears webtable (be careful!), and then puts a bunch of pages, and finally
> counts that all are present and their values correspond to keys. To make
> things more interesting each execution context (thread or process) closes and
> reopens its instance of DataStore a few times.
> The multithreaded test passes just fine. However, the multi-process test
> fails with missing keys, as many as 30%.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.