[jira] Updated: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-27 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-893:


Fix Version/s: 2.0
 Priority: Blocker  (was: Major)

Marking as blocker and must be fixed for 2.0

 DataStore.put() silently loses records when executed from multiple processes
 

 Key: NUTCH-893
 URL: https://issues.apache.org/jira/browse/NUTCH-893
 Project: Nutch
  Issue Type: Bug
Affects Versions: 2.0
 Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 
 1.6
Reporter: Andrzej Bialecki 
Priority: Blocker
 Fix For: 2.0

 Attachments: NUTCH-893.patch


 In order to debug the issue described in NUTCH-879 I created a test to 
 simulate multiple clients appending to webtable (please see the patch), which 
 is the situation that we have in distributed map-reduce jobs.
 There are two tests there: one that uses multiple threads within the same 
 JVM, and another that uses single thread in multiple JVMs. Each test first 
 clears webtable (be careful!), and then puts a bunch of pages, and finally 
 counts that all are present and their values correspond to keys. To make 
 things more interesting each execution context (thread or process) closes and 
 reopens its instance of DataStore a few times.
 The multithreaded test passes just fine. However, the multi-process test 
 fails with missing keys, as many as 30%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-25 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated NUTCH-893:


Attachment: NUTCH-893.patch

Unit test to illustrate the issue.

 DataStore.put() silently loses records when executed from multiple processes
 

 Key: NUTCH-893
 URL: https://issues.apache.org/jira/browse/NUTCH-893
 Project: Nutch
  Issue Type: Bug
Affects Versions: 2.0
 Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 
 1.6
Reporter: Andrzej Bialecki 
 Attachments: NUTCH-893.patch


 In order to debug the issue described in NUTCH-879 I created a test to 
 simulate multiple clients appending to webtable (please see the patch), which 
 is the situation that we have in distributed map-reduce jobs.
 There are two tests there: one that uses multiple threads within the same 
 JVM, and another that uses single thread in multiple JVMs. Each test first 
 clears webtable (be careful!), and then puts a bunch of pages, and finally 
 counts that all are present and their values correspond to keys. To make 
 things more interesting each execution context (thread or process) closes and 
 reopens its instance of DataStore a few times.
 The multithreaded test passes just fine. However, the multi-process test 
 fails with missing keys, as many as 30%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.