Re: Alternative search box for Nutch site

2010-08-30 Thread Otis Gospodnetic
Hello peeps,

We've created a patch for Tika and got some good and constructive feedback (see 
https://issues.apache.org/jira/browse/TIKA-488 ).

Should we follow the same functionality pattern for nutch.apache.org as seen in 
TIKA-488?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Otis Gospodnetic ogjunk-nu...@yahoo.com
 To: dev@nutch.apache.org
 Sent: Mon, August 9, 2010 4:49:18 PM
 Subject: Alternative search box for Nutch site
 
 Hello,
 (sending this to d...@nutch instead of old  nutch-...@lucene)
 
 Over at http://search-lucene.com we index Nutch's mailing lists,  wiki, web 
site, 

 source code, javadoc, jira...
 
 Would the community be  interested in a patch that adds another search option 
to 

 the search box on  nutch.apache.org?
 
 I happened to try a few searches from nutch.a.o just  now (now: yesterday) 
 and 
I 

 got stuff like this:
 
   Found 189  results in 6.211 seconds. Displaying page 1 of 19, sorted by
   Found  12,808 results in 64.342 seconds. Displaying page 1 of 1,281, sorted 
  
by
 
 Note the times.  Ouch!
 This makes me think having an  alternative option would be a good thing to 
have.
 
 Assuming people are for  this, any suggestions for how the search should 
function 

 by default or any  specific instructions for how the search box should be 
 modified would be  great!
 
 Thanks,
 Otis
 


Re: Alternative search box for Nutch site

2010-08-30 Thread Andrzej Bialecki

On 2010-08-30 12:21, Otis Gospodnetic wrote:

Hello peeps,

We've created a patch for Tika and got some good and constructive feedback (see
https://issues.apache.org/jira/browse/TIKA-488 ).

Should we follow the same functionality pattern for nutch.apache.org as seen in
TIKA-488?


Sure, why not - when preparing the patch let's follow the same 
rationales as those in TIKA-488, since they are applicable here too.



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



[jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-30 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904226#action_12904226
 ] 

Andrzej Bialecki  commented on NUTCH-893:
-

Dogacan, flush() doesn't help - there are still missing keys. What's 
interesting is that the missing keys form sequential ranges. Could this be 
perhaps an issue with connection management, or some synchronization issue?

 DataStore.put() silently loses records when executed from multiple processes
 

 Key: NUTCH-893
 URL: https://issues.apache.org/jira/browse/NUTCH-893
 Project: Nutch
  Issue Type: Bug
Affects Versions: 2.0
 Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 
 1.6
Reporter: Andrzej Bialecki 
Priority: Blocker
 Fix For: 2.0

 Attachments: NUTCH-893.patch


 In order to debug the issue described in NUTCH-879 I created a test to 
 simulate multiple clients appending to webtable (please see the patch), which 
 is the situation that we have in distributed map-reduce jobs.
 There are two tests there: one that uses multiple threads within the same 
 JVM, and another that uses single thread in multiple JVMs. Each test first 
 clears webtable (be careful!), and then puts a bunch of pages, and finally 
 counts that all are present and their values correspond to keys. To make 
 things more interesting each execution context (thread or process) closes and 
 reopens its instance of DataStore a few times.
 The multithreaded test passes just fine. However, the multi-process test 
 fails with missing keys, as many as 30%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes

2010-08-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904316#action_12904316
 ] 

Doğacan Güney commented on NUTCH-893:
-

The code already calls close() so if flush() doesn't help, then yeah, this 
sounds like an issue with connection management or synchronization. I'll test 
what happens if we change SqlStore logic to not buffer statements at all, 
instead directly execute them.

 DataStore.put() silently loses records when executed from multiple processes
 

 Key: NUTCH-893
 URL: https://issues.apache.org/jira/browse/NUTCH-893
 Project: Nutch
  Issue Type: Bug
Affects Versions: 2.0
 Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 
 1.6
Reporter: Andrzej Bialecki 
Priority: Blocker
 Fix For: 2.0

 Attachments: NUTCH-893.patch


 In order to debug the issue described in NUTCH-879 I created a test to 
 simulate multiple clients appending to webtable (please see the patch), which 
 is the situation that we have in distributed map-reduce jobs.
 There are two tests there: one that uses multiple threads within the same 
 JVM, and another that uses single thread in multiple JVMs. Each test first 
 clears webtable (be careful!), and then puts a bunch of pages, and finally 
 counts that all are present and their values correspond to keys. To make 
 things more interesting each execution context (thread or process) closes and 
 reopens its instance of DataStore a few times.
 The multithreaded test passes just fine. However, the multi-process test 
 fails with missing keys, as many as 30%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.