[
https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1741:
----------------------------------------
Attachment: NUTCH-1741v7.patch
Managed to update this at the weekend and forgot to upload.
Some thing which we need to consider
* mappings in gora-*-mapping.xml files need to be more thoroughly tested as
the backend mappings may not be most efficient for storing the new stiemaps and
sitemap priority data structures.
* There are 4 tests being skipped in TestGeneratorJob, I'm going to log a new
ticket for this and we can fix it over there. This is not a blocker for
committing and further testing this rather substantial Sitemaps patch for 2.X.
Generally speaking sterling effort [~alparslan.avci] and especially [~cguzel]
within GSoC 2015 :)
I'm going to commit to 2.X now as I've tested locally.
> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
> Key: NUTCH-1741
> URL: https://issues.apache.org/jira/browse/NUTCH-1741
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher, generator
> Reporter: Alparslan Avcı
> Assignee: cihad güzel
> Labels: gsoc2015
> Fix For: 2.4
>
> Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch,
> NUTCH-1741-v4.patch, NUTCH-1741.patch, NUTCH-1741v5.patch,
> NUTCH-1741v6.patch, NUTCH-1741v7.patch, SitemapCrawlerLifeCycle.pdf,
> SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed
> in NUTCH-1465 for trunk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)