[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564019#comment-13564019 ]
Ken Krugler commented on NUTCH-1465: ------------------------------------ Hi Tejas - I thought the current CC robots parsing code was already extracting the sitemap links. Or is the above comment ("modified the robots parsing code to extract the links to sitemap pages") a change to the current Nutch robots parsing code? I do remember thinking that the CC version would need to change to support multiple Sitemap links, even though it wasn't clear whether that was actually valid. -- Ken > Support sitemaps in Nutch > ------------------------- > > Key: NUTCH-1465 > URL: https://issues.apache.org/jira/browse/NUTCH-1465 > Project: Nutch > Issue Type: New Feature > Components: parser > Reporter: Lewis John McGibbney > Fix For: 1.7 > > Attachments: NUTCH-1465-trunk.v1.patch > > > I recently came across this rather stagnant codebase[0] which is ASL v2.0 > licensed and appears to have been used successfully to parse sitemaps as per > the discussion here[1]. > [0] http://sourceforge.net/projects/sitemap-parser/ > [1] > http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira