[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526363
]
Hudson commented on NUTCH-546:
--
Integrated in Nutch-Nightly #203 (See
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526667
]
Hudson commented on NUTCH-546:
--
Integrated in Nutch-Nightly #204 (See
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524462
]
Doğacan Güney commented on NUTCH-546:
-
OK, then. I will send a patch that 'pluginifies' UrlValidator soon.
file
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523502
]
Doğacan Güney commented on NUTCH-546:
-
Btw, I also realized that a url like http://localhost/; is considered
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523515
]
Marc Brette commented on NUTCH-546:
---
Why don't we rely on the regexp urlfilter for removing such URL ? Is there a
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523539
]
Doğacan Güney commented on NUTCH-546:
-
Why don't we rely on the regexp urlfilter for removing such URL ? Is
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523578
]
Andrzej Bialecki commented on NUTCH-546:
-
+1 - I think it's the best solution so far. Regarding the use of
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522941
]
Marc Brette commented on NUTCH-546:
---
Sounds good. The URL syntax is defined in RFC 1738
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522944
]
Doğacan Güney commented on NUTCH-546:
-
I don't know the design decision behind UrlValidator, but why didn't you
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522987
]
Marc Brette commented on NUTCH-546:
---
Thanks. I like the normalizer approach.
file URL are filtered out by the
[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522852
]
Doğacan Güney commented on NUTCH-546:
-
This is true, I missed it when committing UrlValidator. I guess we can
11 matches
Mail list logo