Re: Alternative search box for Nutch site
Hello peeps, We've created a patch for Tika and got some good and constructive feedback (see https://issues.apache.org/jira/browse/TIKA-488 ). Should we follow the same functionality pattern for nutch.apache.org as seen in TIKA-488? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Otis Gospodnetic ogjunk-nu...@yahoo.com To: dev@nutch.apache.org Sent: Mon, August 9, 2010 4:49:18 PM Subject: Alternative search box for Nutch site Hello, (sending this to d...@nutch instead of old nutch-...@lucene) Over at http://search-lucene.com we index Nutch's mailing lists, wiki, web site, source code, javadoc, jira... Would the community be interested in a patch that adds another search option to the search box on nutch.apache.org? I happened to try a few searches from nutch.a.o just now (now: yesterday) and I got stuff like this: Found 189 results in 6.211 seconds. Displaying page 1 of 19, sorted by Found 12,808 results in 64.342 seconds. Displaying page 1 of 1,281, sorted by Note the times. Ouch! This makes me think having an alternative option would be a good thing to have. Assuming people are for this, any suggestions for how the search should function by default or any specific instructions for how the search box should be modified would be great! Thanks, Otis
Re: Alternative search box for Nutch site
On 2010-08-30 12:21, Otis Gospodnetic wrote: Hello peeps, We've created a patch for Tika and got some good and constructive feedback (see https://issues.apache.org/jira/browse/TIKA-488 ). Should we follow the same functionality pattern for nutch.apache.org as seen in TIKA-488? Sure, why not - when preparing the patch let's follow the same rationales as those in TIKA-488, since they are applicable here too. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
[jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes
[ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904226#action_12904226 ] Andrzej Bialecki commented on NUTCH-893: - Dogacan, flush() doesn't help - there are still missing keys. What's interesting is that the missing keys form sequential ranges. Could this be perhaps an issue with connection management, or some synchronization issue? DataStore.put() silently loses records when executed from multiple processes Key: NUTCH-893 URL: https://issues.apache.org/jira/browse/NUTCH-893 Project: Nutch Issue Type: Bug Affects Versions: 2.0 Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 1.6 Reporter: Andrzej Bialecki Priority: Blocker Fix For: 2.0 Attachments: NUTCH-893.patch In order to debug the issue described in NUTCH-879 I created a test to simulate multiple clients appending to webtable (please see the patch), which is the situation that we have in distributed map-reduce jobs. There are two tests there: one that uses multiple threads within the same JVM, and another that uses single thread in multiple JVMs. Each test first clears webtable (be careful!), and then puts a bunch of pages, and finally counts that all are present and their values correspond to keys. To make things more interesting each execution context (thread or process) closes and reopens its instance of DataStore a few times. The multithreaded test passes just fine. However, the multi-process test fails with missing keys, as many as 30%. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes
[ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904316#action_12904316 ] Doğacan Güney commented on NUTCH-893: - The code already calls close() so if flush() doesn't help, then yeah, this sounds like an issue with connection management or synchronization. I'll test what happens if we change SqlStore logic to not buffer statements at all, instead directly execute them. DataStore.put() silently loses records when executed from multiple processes Key: NUTCH-893 URL: https://issues.apache.org/jira/browse/NUTCH-893 Project: Nutch Issue Type: Bug Affects Versions: 2.0 Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 1.6 Reporter: Andrzej Bialecki Priority: Blocker Fix For: 2.0 Attachments: NUTCH-893.patch In order to debug the issue described in NUTCH-879 I created a test to simulate multiple clients appending to webtable (please see the patch), which is the situation that we have in distributed map-reduce jobs. There are two tests there: one that uses multiple threads within the same JVM, and another that uses single thread in multiple JVMs. Each test first clears webtable (be careful!), and then puts a bunch of pages, and finally counts that all are present and their values correspond to keys. To make things more interesting each execution context (thread or process) closes and reopens its instance of DataStore a few times. The multithreaded test passes just fine. However, the multi-process test fails with missing keys, as many as 30%. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.