[ 
https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1741:
----------------------------------------
    Attachment: NUTCH-1741v5.patch

Patch for 2.X HEAD which adds missing license headers, applies cleanly with no 
fuzziness and builds and tests successfully.

[~cguzel] great work on this pach. We have a few issues.
 1) as you've mentioned on the mailing list, we have some issues with the 
MemStore in Gora which means we need to fix this. We need to be running the 
tests in order to put use the code you've implemented and also to build 
confidence in the sitemap parser logic.
 2) What about adding your implementation to the src/bin scripts? Are you happy 
with this not being part of the logic contained within there? Maybe at a later 
stage we can think about this.
 3) I notice no Javadoc for new classes you've implemented... can you add 
Javadoc to detail what the Sitemap data struture (Map<CharSequence, 
CharSequence>) looks like, how the logic works, etc? This would make it much 
more clear to others trying to read the code.
 4) I like the way that you've consistently modularized code into methods 
throughout your new work. This is really nice.

If we can address the above then we will be good to think about further 
validation through testing and thn merging into 2.X.



> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
>                 Key: NUTCH-1741
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1741
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, generator
>            Reporter: Alparslan Avcı
>              Labels: gsoc2015
>             Fix For: 2.4
>
>         Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, 
> NUTCH-1741-v4.patch, NUTCH-1741.patch, NUTCH-1741v5.patch, 
> SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed 
> in NUTCH-1465 for trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to