[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092964#comment-16092964 ]
Hudson commented on NUTCH-1465: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch-trunk #3435 (See [https://builds.apache.org/job/Nutch-trunk/3435/]) NUTCH-1465 (markus: [https://github.com/apache/nutch/commit/b58d6cd9111b2d25b8f6f009015ac214bac4006d]) * (edit) conf/log4j.properties * (add) src/java/org/apache/nutch/util/SitemapProcessor.java * (edit) ivy/ivy.xml * (edit) conf/nutch-default.xml * (edit) src/bin/nutch > Support sitemaps in Nutch > ------------------------- > > Key: NUTCH-1465 > URL: https://issues.apache.org/jira/browse/NUTCH-1465 > Project: Nutch > Issue Type: New Feature > Components: parser > Reporter: Lewis John McGibbney > Assignee: Markus Jelsma > Fix For: 1.14 > > Attachments: NUTCH-1465.patch, NUTCH-1465.patch, NUTCH-1465.patch, > NUTCH-1465.patch, NUTCH-1465-sitemapinjector-trunk-v1.patch, > NUTCH-1465-trunk.v1.patch, NUTCH-1465-trunk.v2.patch, > NUTCH-1465-trunk.v3.patch, NUTCH-1465-trunk.v4.patch, > NUTCH-1465-trunk.v5.patch > > > I recently came across this rather stagnant codebase[0] which is ASL v2.0 > licensed and appears to have been used successfully to parse sitemaps as per > the discussion here[1]. > [0] http://sourceforge.net/projects/sitemap-parser/ > [1] > http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)