[
https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558401#comment-15558401
]
Alfonso Nishikawa edited comment on NUTCH-1741 at 10/8/16 5:46 PM:
-------------------------------------------------------------------
Attached a proposed patch for webpage.avsc ([^NUTCH-1741-webpage-avsc.patch]).
I suspect the creator of the final patch pressed backspace or moved some
bracket unnoticed just before creating NUTCH-1741v7.patch, since the Persistent
WebPage.SCHEMA$ has the right schema:
If you take a look at the schema of the version in the repository atm [1], near
the end it shows:
{code}
\"default\":{}},{\"name\":\"stmPriority\"
{code}
But the schema definition webpage.avsc at [2] shows:
{code}
"default": {
},
{
"name": "stmPriority",
{code}
The patch just fixes de schema, but no recompilation should be needed.
I use HBase but in a personalized Nutch to support own GORA-0.7-SNAPSHOT.
[1] -
https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/java/org/apache/nutch/storage/WebPage.java#L31
[2] -
https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/gora/webpage.avsc#L294
was (Author: alfonso.nishikawa):
Attached a proposed patch for webpage.avsc.
I suspect the creator of the final patch pressed backspace or moved some
bracket unnoticed just before creating NUTCH-1741v7.patch, since the Persistent
WebPage.SCHEMA$ has the right schema:
If you take a look at the schema of the version in the repository atm [1], near
the end it shows:
{code}
\"default\":{}},{\"name\":\"stmPriority\"
{code}
But the schema definition webpage.avsc at [2] shows:
{code}
"default": {
},
{
"name": "stmPriority",
{code}
The patch just fixes de schema, but no recompilation should be needed.
I use HBase but in a personalized Nutch to support own GORA-0.7-SNAPSHOT.
[1] -
https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/java/org/apache/nutch/storage/WebPage.java#L31
[2] -
https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/gora/webpage.avsc#L294
> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
> Key: NUTCH-1741
> URL: https://issues.apache.org/jira/browse/NUTCH-1741
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher, generator
> Reporter: Alparslan Avcı
> Assignee: Cihad Guzel
> Labels: gsoc2015
> Fix For: 2.4
>
> Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch,
> NUTCH-1741-v4.patch, NUTCH-1741-webpage-avsc.patch, NUTCH-1741.patch,
> NUTCH-1741v5.patch, NUTCH-1741v6.patch, NUTCH-1741v7.patch,
> SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed
> in NUTCH-1465 for trunk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)