[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526363 ] Hudson commented on NUTCH-546: -- Integrated in Nutch-Nightly #203 (See

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526667 ] Hudson commented on NUTCH-546: -- Integrated in Nutch-Nightly #204 (See

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-09-03 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524462 ] Doğacan Güney commented on NUTCH-546: - OK, then. I will send a patch that 'pluginifies' UrlValidator soon. file

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-29 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523502 ] Doğacan Güney commented on NUTCH-546: - Btw, I also realized that a url like http://localhost/; is considered

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-29 Thread Marc Brette (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523515 ] Marc Brette commented on NUTCH-546: --- Why don't we rely on the regexp urlfilter for removing such URL ? Is there a

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-29 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523539 ] Doğacan Güney commented on NUTCH-546: - Why don't we rely on the regexp urlfilter for removing such URL ? Is

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-29 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523578 ] Andrzej Bialecki commented on NUTCH-546: - +1 - I think it's the best solution so far. Regarding the use of

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-27 Thread Marc Brette (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522941 ] Marc Brette commented on NUTCH-546: --- Sounds good. The URL syntax is defined in RFC 1738

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-27 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522944 ] Doğacan Güney commented on NUTCH-546: - I don't know the design decision behind UrlValidator, but why didn't you

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-27 Thread Marc Brette (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522987 ] Marc Brette commented on NUTCH-546: --- Thanks. I like the normalizer approach. file URL are filtered out by the

[jira] Commented: (NUTCH-546) file URL are filtered out by the crawler

2007-08-26 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522852 ] Doğacan Güney commented on NUTCH-546: - This is true, I missed it when committing UrlValidator. I guess we can