[ https://issues.apache.org/jira/browse/NUTCH-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-2467: --------------------------------- Attachment: NUTCH-2467.patch Incredible stupid patch but i did it because the sitemap.type thing being null is probably a bug. And this patch should probably be reverted if it is fixed. Any CC comments on this? > Sitemap type field can be null > ------------------------------ > > Key: NUTCH-2467 > URL: https://issues.apache.org/jira/browse/NUTCH-2467 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.13 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.14 > > Attachments: NUTCH-2467.patch > > > sitemap.isIndex() can return null for real sitemap indices, so there contents > won't be added to the CrawlDB. Example, the indices > https://www.reisenco.nl/sitemap_index.xml points to are not processed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)