[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986617#comment-15986617
]
ASF GitHub Bot commented on NUTCH-1465:
---------------------------------------
sebastian-nagel commented on a change in pull request #189: NUTCH-1465 Support
sitemaps in Nutch
URL: https://github.com/apache/nutch/pull/189#discussion_r113684593
##########
File path: conf/nutch-default.xml
##########
@@ -2529,7 +2529,33 @@ visit
https://wiki.apache.org/nutch/SimilarityScoringFilter-->
<value></value>
<description>
Default is 'fanout.key'
- The routingKey used by publisher to publish messages to specific queues.
If the exchange type is "fanout", then this property is ignored.
+ The routingKey used by publisher to publish messages to specific queues.
+ If the exchange type is "fanout", then this property is ignored.
+ </description>
+</property>
+
+<property>
Review comment:
These 3 properties are used to transfer command-line options from Hadoop
client to tasks. The values are always overwritten, it doesn't make sense to
set them here or in nutch-site.xml.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Support sitemaps in Nutch
> -------------------------
>
> Key: NUTCH-1465
> URL: https://issues.apache.org/jira/browse/NUTCH-1465
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 1.14
>
> Attachments: NUTCH-1465-sitemapinjector-trunk-v1.patch,
> NUTCH-1465-trunk.v1.patch, NUTCH-1465-trunk.v2.patch,
> NUTCH-1465-trunk.v3.patch, NUTCH-1465-trunk.v4.patch,
> NUTCH-1465-trunk.v5.patch
>
>
> I recently came across this rather stagnant codebase[0] which is ASL v2.0
> licensed and appears to have been used successfully to parse sitemaps as per
> the discussion here[1].
> [0] http://sourceforge.net/projects/sitemap-parser/
> [1]
> http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)