Hi Lewis, it seems to be related to NUTCH-1714: WebPage-owned maps (metadata, headers, etc.) are not initialized any more in the constructor. This causes also other tests to fail.
The solution would be to replace WebPage page = new WebPage(); by WebPage page = WebPage.newBuilder().build(); in every test where a WebPage object is needed. Right? I'll open a Jira and try to provide a patch. Cheers, Sebastian On 06/18/2014 06:07 PM, Lewis John Mcgibbney wrote: > Hi Folks, > A while ago, somewhere, we broke the 2.x build! > I've described this in NUTCH-1792 > <https://issues.apache.org/jira/browse/NUTCH-1792> > Here is the paste log which somewhere includes the commit which broke the > build. > Does anyone have a clue why the TestImageMetadata test for parse-tika is > failing? > Thanks > Lewis > > ------------------------------------------------------------------------ > r1601937 | jnioche | 2014-06-11 11:56:20 -0400 (Wed, 11 Jun 2014) | 1 line > > NUTCH-1736 <https://issues.apache.org/jira/browse/NUTCH-1736> Can't fetch > page if http response > header contains Transfer-Encoding:chunked > ------------------------------------------------------------------------ > r1600837 | markus | 2014-06-06 06:01:51 -0400 (Fri, 06 Jun 2014) | 2 lines > > NUTCH-1782 <https://issues.apache.org/jira/browse/NUTCH-1782> NodeWalker to > return current node > > ------------------------------------------------------------------------ > r1600599 | jnioche | 2014-06-05 07:09:42 -0400 (Thu, 05 Jun 2014) | 1 line > > Fixing blunder in Nutch-1781 > ------------------------------------------------------------------------ > r1600561 | lewismc | 2014-06-04 23:00:10 -0400 (Wed, 04 Jun 2014) | 1 line > > NUTCH-1788 <https://issues.apache.org/jira/browse/NUTCH-1788> Tika may return > multiple values for > Title on PDF's > ------------------------------------------------------------------------ > r1600559 | lewismc | 2014-06-04 22:17:14 -0400 (Wed, 04 Jun 2014) | 1 line > > Temporary disable TestGoraStore due to GORA-326 > <https://issues.apache.org/jira/browse/GORA-326> > Removal of _g_dirty field from _ALL_FIELDS array and Field Enum in Persistent > classes > ------------------------------------------------------------------------ > r1600546 | lewismc | 2014-06-04 20:18:02 -0400 (Wed, 04 Jun 2014) | 1 line > > NUTCH-1781 <https://issues.apache.org/jira/browse/NUTCH-1781> Update > gora-*-mapping.xml and > gora.proeprties to reflect Gora 0.4 > ------------------------------------------------------------------------ > r1598622 | jnioche | 2014-05-30 10:55:51 -0400 (Fri, 30 May 2014) | 1 line > > NUTCH-1768 <https://issues.apache.org/jira/browse/NUTCH-1768> Upgrade to > ElasticSearch 1.1.0 > ------------------------------------------------------------------------ > r1598619 | jnioche | 2014-05-30 10:50:45 -0400 (Fri, 30 May 2014) | 1 line > > NUTCH-1634 <https://issues.apache.org/jira/browse/NUTCH-1634> : readdb -stats > shows the result twice > ------------------------------------------------------------------------ > r1595398 | lewismc | 2014-05-16 20:38:18 -0400 (Fri, 16 May 2014) | 1 line > > NUTCH-1780 <https://issues.apache.org/jira/browse/NUTCH-1780> ttl and > gc_grace_seconds attributes > are missing from gora-cassandra-mapping.xml file > ------------------------------------------------------------------------ > r1595196 | jnioche | 2014-05-16 09:40:21 -0400 (Fri, 16 May 2014) | 1 line > > NUTCH-1676 <https://issues.apache.org/jira/browse/NUTCH-1676> Add rudimentary > SSL support to > protocol-http > ------------------------------------------------------------------------ > r1594813 | jnioche | 2014-05-15 04:14:38 -0400 (Thu, 15 May 2014) | 1 line > > NUTCH-1674 <https://issues.apache.org/jira/browse/NUTCH-1674> Use batchId > filter to enable scan > (GORA-119 <https://issues.apache.org/jira/browse/GORA-119>) for > Fetch,Parse,Update,Index (Tien > Nguyen Manh and Alparslan Avcı via jnioche) > ------------------------------------------------------------------------ > r1594812 | jnioche | 2014-05-15 04:10:07 -0400 (Thu, 15 May 2014) | 1 line > > NUTCH-1714 <https://issues.apache.org/jira/browse/NUTCH-1714> Nutch 2.x > upgrade to Gora 0.4 > ------------------------------------------------------------------------ > r1594071 | snagel | 2014-05-12 15:39:43 -0400 (Mon, 12 May 2014) | 1 line > > NUTCH-1752 <https://issues.apache.org/jira/browse/NUTCH-1752> Cache > robots.txt rules per > protocol:host:port > ------------------------------------------------------------------------ > r1593954 | jnioche | 2014-05-12 08:58:41 -0400 (Mon, 12 May 2014) | 1 line > > NUTCH-1613 <https://issues.apache.org/jira/browse/NUTCH-1613> Timeouts in > protocol-httpclient when > crawling same host with >2 threads > ------------------------------------------------------------------------ > r1592414 | snagel | 2014-05-04 16:18:50 -0400 (Sun, 04 May 2014) | 1 line > > NUTCH-1182 <https://issues.apache.org/jira/browse/NUTCH-1182> fetcher to log > hung threads > ------------------------------------------------------------------------ > > > > -- > /Lewis/

