[jira] [Updated] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-09-15 Thread Ferdy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy updated NUTCH-1097: - Attachment: NUTCH-1097-trunk_v1.patch The patch for Nutch trunk. application/xhtml+xml should be enabled in

[jira] [Commented] (NUTCH-1005) Index headings plugin

2011-09-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105456#comment-13105456 ] Julien Nioche commented on NUTCH-1005: -- you are right. I'd read your comments too

Re: Setting up Jenkins CI for Nutch Branches

2011-09-15 Thread lewis john mcgibbney
Hi Chris, If you could set me up it would be great. I will be reporting to the dev's with any progress with the build so will progress to create the job in due course. Thank you On Thu, Sep 15, 2011 at 7:52 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Lewis, I'm

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-09-15 Thread Markus Jelsma
Hi Guys, I thought I'd chime in on this thread. My comments below: I understand and share your frustration, however you need to bear in mind that things are done only if people volunteer and have time - usually taken from their holiday, weekends, evenings. Chris (who is the de facto

[Nutch Wiki] Trivial Update of Archive and Legacy by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The Archive and Legacy page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Archive%20and%20Legacy?action=diffrev1=19rev2=20 === General Information === *

[Nutch Wiki] Trivial Update of Archive and Legacy by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The Archive and Legacy page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Archive%20and%20Legacy?action=diffrev1=20rev2=21 === General Information === *

[Nutch Wiki] Trivial Update of OldFAQs by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The OldFAQs page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/OldFAQs New page: This is the official resource fod OLD Nutch FAQs. TableOfContents

Re: Setting up Jenkins CI for Nutch Branches

2011-09-15 Thread Mattmann, Chris A (388J)
Hi Lewis, [mattmann@minotaur]/home/mattmann(24): modify_appgroups.pl hudson-jobadmin --add=lewismc LDAP Password (^D aborts): Done! Notification sent to r...@apache.org. [mattmann@minotaur]/home/mattmann(25): Done! See: http://wiki.apache.org/general/Hudson And http://builds.apache.org/

[Nutch Wiki] Trivial Update of ErrorMessages by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The ErrorMessages page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diffrev1=7rev2=8 * Updating * Searching + Exception:

[Nutch Wiki] Trivial Update of FAQ by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FAQ page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FAQ?action=diffrev1=127rev2=128 Please visit our [[http://lucene.apache.org/nutch/bot.html|webmaster

[jira] [Created] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
Merging segments causes URLs to vanish from crawldb/index? -- Key: NUTCH-1113 URL: https://issues.apache.org/jira/browse/NUTCH-1113 Project: Nutch Issue Type: Bug Affects Versions:

[Nutch Wiki] Trivial Update of ErrorMessages by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The ErrorMessages page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diffrev1=9rev2=10 Please report bugs to the mailing list! +

[Nutch Wiki] Trivial Update of ErrorMessages by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The ErrorMessages page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diffrev1=10rev2=11 TableOfContents - == General == + = General =

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105661#comment-13105661 ] Markus Jelsma commented on NUTCH-1113: -- Can you rule out the indexer and see what you

[Nutch Wiki] Trivial Update of FAQ by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FAQ page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FAQ?action=diffrev1=128rev2=129 . Change this line: -^(file|ftp|mailto|https): to this:

[Nutch Wiki] Trivial Update of FAQ by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FAQ page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FAQ?action=diffrev1=129rev2=130 There's a user, developer, commits and agents lists, all available at

[Nutch Wiki] Trivial Update of OldFAQs by LewisJohnMcgibbney

2011-09-15 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The OldFAQs page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/OldFAQs?action=diffrev1=1rev2=2 TableOfContents + My system does not find the segments

[jira] [Commented] (NUTCH-251) Administration GUI

2011-09-15 Thread hadi (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105673#comment-13105673 ] hadi commented on NUTCH-251: Does tis plugin work with nutch 1.3? Administration GUI

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1113: -- Attachment: merged_segment_output.txt unmerged_segment_output.txt Output for

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105693#comment-13105693 ] Edward Drapkin commented on NUTCH-1113: --- Using this command: nutch readseg -get

[jira] [Commented] (NUTCH-251) Administration GUI

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105696#comment-13105696 ] Markus Jelsma commented on NUTCH-251: - Not likely. There's, however, an open issue to

[jira] [Updated] (NUTCH-1112) off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE content

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1112: -- Attachment: httpresponse.patch Patch fixing off-by-1 error off-by-one error in

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1113: - Fix Version/s: 1.4 Thanks! It's marked for 1.4 now so it, at least, doesn't slip of the radar.

[jira] [Commented] (NUTCH-1112) off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE content

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105702#comment-13105702 ] Edward Drapkin commented on NUTCH-1112: --- All that needs to be changed is the needs

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105704#comment-13105704 ] Edward Drapkin commented on NUTCH-1113: --- I don't have any idea what's causing this

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105715#comment-13105715 ] Markus Jelsma commented on NUTCH-1113: -- Investigation, debug report; same stuff

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105714#comment-13105714 ] Edward Drapkin commented on NUTCH-1113: --- Upon further inspection, it appears that

[jira] [Issue Comment Edited] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105714#comment-13105714 ] Edward Drapkin edited comment on NUTCH-1113 at 9/15/11 9:52 PM:

[jira] [Updated] (NUTCH-1112) off-by-one error in protocol-httpclient; truncates up to HttpBase.BUFFER_SIZE content

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1112: - Patch Info: [Patch Available] Fix Version/s: 1.4 Thanks. Marked for 1.4. off-by-one

Build failed in Jenkins: Nutch-branch-1.4 #1

2011-09-15 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-branch-1.4/1/ -- [...truncated 949 lines...] A src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter A src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api AU

Build failed in Jenkins: Nutch-branch-1.4 #2

2011-09-15 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-branch-1.4/2/ -- [...truncated 950 lines...] A src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api AU src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexRule.java

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105758#comment-13105758 ] Edward Drapkin commented on NUTCH-1113: --- The more I look into this, the more I'm

Re: [Nutch Wiki] Trivial Update of OldFAQs by LewisJohnMcgibbney

2011-09-15 Thread Christopher Bader
How do I get off this list? I don't see an unsubscribe option. On Thu, Sep 15, 2011 at 3:00 PM, Apache Wiki wikidi...@apache.org wrote: Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The OldFAQs page has been changed by

Build failed in Jenkins: Nutch-trunk #1605

2011-09-15 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1605/ -- [...truncated 986 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-09-15 Thread Sami Siren
On Thu, Sep 15, 2011 at 9:55 PM, Markus Jelsma markus.jel...@openindex.io wrote: There are many things i can write about this topic right now but don't feel it's neccessary. The choice is difficult and perhaps painful but when the voting round is opened by our project lead, i will vote for