[Nutch Wiki] Trivial Update of CommandLineOptions by MarkusJelsma

2011-10-12 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by MarkusJelsma: http://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=37rev2=38 == Other Classes == * bin/nutch

[jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2011-10-12 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125706#comment-13125706 ] Markus Jelsma commented on NUTCH-797: - Hm, seems the parser.fix.embeddedparams switch

[jira] [Assigned] (NUTCH-1084) ReadDB url throws exception

2011-10-12 Thread Markus Jelsma (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-1084: Assignee: Markus Jelsma ReadDB url throws exception ---

[jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2011-10-12 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125712#comment-13125712 ] Andrzej Bialecki commented on NUTCH-797: - That's unexpected :) I checked the patch

[jira] [Commented] (NUTCH-1084) ReadDB url throws exception

2011-10-12 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125715#comment-13125715 ] Markus Jelsma commented on NUTCH-1084: -- I've checked the write and read methods and

[jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2011-10-12 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125717#comment-13125717 ] Markus Jelsma commented on NUTCH-797: - This test was on a local instance. I tried both

Re: [jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2011-10-12 Thread Andrzej Bialecki
On 12/10/2011 13:17, Markus Jelsma (Commented) (JIRA) wrote: [ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125717#comment-13125717 ] Markus Jelsma commented on NUTCH-797:

Re: [jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2011-10-12 Thread Markus Jelsma
Ah yes, i copied the wrong URL indeed. I executed the commands with the URL enclosed in quotes and got the same output with for both true and false for parser.fix.embeddedparams. Sorry :) On Wednesday 12 October 2011 13:59:05 Andrzej Bialecki wrote: On 12/10/2011 13:17, Markus Jelsma

[jira] [Created] (NUTCH-1171) WebGraph to overwrite normalized input keys

2011-10-12 Thread Markus Jelsma (Created) (JIRA)
WebGraph to overwrite normalized input keys --- Key: NUTCH-1171 URL: https://issues.apache.org/jira/browse/NUTCH-1171 Project: Nutch Issue Type: Improvement Reporter: Markus Jelsma

[jira] [Resolved] (NUTCH-1171) WebGraph to overwrite normalized input keys

2011-10-12 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1171. -- Resolution: Duplicate Decided it would be better to incorporate this change in NUTCH-1142

[jira] [Updated] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-10-12 Thread Ferdy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy updated NUTCH-1097: - Attachment: NUTCH-1097-v4.patch NUTCH-1097-nutchgora_v2.patch Thanks for looking into this, too.

[jira] [Updated] (NUTCH-1142) Normalization and filtering in WebGraph

2011-10-12 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1142: - Attachment: NUTCH-1142-1.5-3.patch New patch with the ability to normalize and filter existing

[jira] [Commented] (NUTCH-1142) Normalization and filtering in WebGraph

2011-10-12 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125911#comment-13125911 ] Markus Jelsma commented on NUTCH-1142: -- The tests finished. Legacy URL's we had

[jira] [Commented] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-10-12 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125916#comment-13125916 ] Andrzej Bialecki commented on NUTCH-1097: -- +1, the latest patch looks good.

[jira] [Commented] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-10-12 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125925#comment-13125925 ] Markus Jelsma commented on NUTCH-1097: -- +1, very useful

[jira] [Commented] (NUTCH-1142) Normalization and filtering in WebGraph

2011-10-12 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125931#comment-13125931 ] Andrzej Bialecki commented on NUTCH-1142: -- +1, the patch looks good. (There is

[jira] [Commented] (NUTCH-1142) Normalization and filtering in WebGraph

2011-10-12 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125940#comment-13125940 ] Markus Jelsma commented on NUTCH-1142: -- You are right, of course, although the

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-10-12 Thread Sebastian Nagel (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125965#comment-13125965 ] Sebastian Nagel commented on NUTCH-1098: Spaces in URLs are quite frequent and

[jira] [Resolved] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-10-12 Thread Lewis John McGibbney (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1097. - Resolution: Fixed Fix Version/s: nutchgora Committed @ revision 1182504

[jira] [Closed] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-10-12 Thread Lewis John McGibbney (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1097. --- application/xhtml+xml should be enabled in plugin.xml of parse-html; allow

[jira] [Resolved] (NUTCH-1109) Add Sonar targets to Ant build.xml

2011-10-12 Thread Lewis John McGibbney (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1109. - Resolution: Fixed Committed @ revision 1182511 in nutchgora branch

[jira] [Closed] (NUTCH-1109) Add Sonar targets to Ant build.xml

2011-10-12 Thread Lewis John McGibbney (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1109. --- closing this issue which can now be dealt with by the infra team. Add

[jira] [Commented] (NUTCH-1001) bin/nutch fetch/parse handle crawl/segments directory

2011-10-12 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126029#comment-13126029 ] Lewis John McGibbney commented on NUTCH-1001: - Hi Gabriele, would it be