[
https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1741:
----------------------------------------
Attachment: NUTCH-1741v5.patch
Patch for 2.X HEAD which adds missing license headers, applies cleanly with no
fuzziness and builds and tests successfully.
[~cguzel] great work on this pach. We have a few issues.
1) as you've mentioned on the mailing list, we have some issues with the
MemStore in Gora which means we need to fix this. We need to be running the
tests in order to put use the code you've implemented and also to build
confidence in the sitemap parser logic.
2) What about adding your implementation to the src/bin scripts? Are you happy
with this not being part of the logic contained within there? Maybe at a later
stage we can think about this.
3) I notice no Javadoc for new classes you've implemented... can you add
Javadoc to detail what the Sitemap data struture (Map<CharSequence,
CharSequence>) looks like, how the logic works, etc? This would make it much
more clear to others trying to read the code.
4) I like the way that you've consistently modularized code into methods
throughout your new work. This is really nice.
If we can address the above then we will be good to think about further
validation through testing and thn merging into 2.X.
> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
> Key: NUTCH-1741
> URL: https://issues.apache.org/jira/browse/NUTCH-1741
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher, generator
> Reporter: Alparslan Avcı
> Labels: gsoc2015
> Fix For: 2.4
>
> Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch,
> NUTCH-1741-v4.patch, NUTCH-1741.patch, NUTCH-1741v5.patch,
> SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed
> in NUTCH-1465 for trunk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)