[
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-893:
--------------------------------
Attachment: NUTCH-893_v2.patch
Dogacan and I spent a fair amount of time to figure out the problem with this
test. We have checked and rechecked the code in gora-sql to make sure. However,
the issue is that in TestGoraStorage#main(), setup() is called, which issues a
deleteByQuery() to delete all the data in the store. When testMultiProcess()
fires up lots of processes, some of the processes first start to write data
(only some of them are committed), but the newly started ones just delete those
newly written data. So this is a sync issue with the test itself.
The uploaded new patch passes the test. So I am afraid, we need to update the
test to cover the issue in NUTCH-879. Andrzej, any suggestion for how to extend
the test to reproduce NUTCH-879?
In the mean time, I will port this test to Gora as a part of
http://github.com/enis/gora/issues#issue/50. Thanks for the excellent patch.
> DataStore.put() silently loses records when executed from multiple processes
> ----------------------------------------------------------------------------
>
> Key: NUTCH-893
> URL: https://issues.apache.org/jira/browse/NUTCH-893
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.0
> Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK
> 1.6
> Reporter: Andrzej Bialecki
> Priority: Blocker
> Fix For: 2.0
>
> Attachments: NUTCH-893.patch, NUTCH-893_v2.patch
>
>
> In order to debug the issue described in NUTCH-879 I created a test to
> simulate multiple clients appending to webtable (please see the patch), which
> is the situation that we have in distributed map-reduce jobs.
> There are two tests there: one that uses multiple threads within the same
> JVM, and another that uses single thread in multiple JVMs. Each test first
> clears webtable (be careful!), and then puts a bunch of pages, and finally
> counts that all are present and their values correspond to keys. To make
> things more interesting each execution context (thread or process) closes and
> reopens its instance of DataStore a few times.
> The multithreaded test passes just fine. However, the multi-process test
> fails with missing keys, as many as 30%.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.