Came across this issue :0) https://issues.apache.org/jira/browse/NUTCH-956
which seems to uncover all mystery with this one. It also reminded me of this conversation recently [0] I will test and get a JUnit case written before attaching new patch to the issue. [0] http://www.mail-archive.com/user%40nutch.apache.org/msg07272.html On Sat, Aug 25, 2012 at 1:18 PM, Lewis John Mcgibbney <[email protected]> wrote: > Hi, > > I've had a random patch lying around one of my desktops for sometime. > > 1) schema.xml is straight foward enough > 2) MoreIndexingFilter.java seems to be an issue of reliability > (possibly). Maybe the Http Header content information can be > unreliable at times? Does anyone have an opinion on this? At the > moment I am none-the-wiser but keen to gather views and/experiences. > 3) Again in SolrWriter.java this may be an issue of reliability > (accuracy?) regarding the proposed explicit equals cast check instead > of the abitrary assignment check. Any thoughts? > > I did not produce this patch and can't remember how or why it ended up > on my desktop! So apologies for the randomness of this one. > > Thanks > > Lewis > > > Index: conf/schema.xml > =================================================================== > --- conf/schema.xml (revision 1145734) > +++ conf/schema.xml (working copy) > @@ -113,6 +113,8 @@ > <!-- fields for creativecommons plugin --> > <field name="cc" type="string" stored="true" indexed="true" > multiValued="true"/> > + > + <field name="tld" type="string" stored="false" indexed="false"/> > </fields> > <uniqueKey>id</uniqueKey> > <defaultSearchField>content</defaultSearchField> > > Index: > src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java > =================================================================== > --- > src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java > (revision 1053817) > +++ > src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java > (working copy) > @@ -172,7 +172,7 @@ > */ > private NutchDocument addType(NutchDocument doc, WebPage page, String url) > { > MimeType mimeType = null; > - Utf8 contentType = page.getFromHeaders(new > Utf8(HttpHeaders.CONTENT_TYPE)); > + Utf8 contentType = page.getContentType(); > if (contentType == null) { > // Note by Jerome Charron on 20050415: > // Content Type not solved by a previous plugin > Index: src/java/org/apache/nutch/indexer/solr/SolrWriter.java > =================================================================== > --- src/java/org/apache/nutch/indexer/solr/SolrWriter.java > (revision 1053817) > +++ src/java/org/apache/nutch/indexer/solr/SolrWriter.java (working copy) > @@ -56,7 +56,7 @@ > for (final String val : e.getValue()) { > inputDoc.addField(solrMapping.mapKey(e.getKey()), val); > String sCopy = solrMapping.mapCopyKey(e.getKey()); > - if (sCopy != e.getKey()) { > + if (! sCopy.equals(e.getKey())) { > inputDoc.addField(sCopy, val); > } > } > > -- > Lewis -- Lewis

