Re: Class Cast exception

Matt Zytaruk Fri, 06 Jan 2006 13:23:54 -0800

So will this throw an exception on older segments? or will it just notget the correct metadata? I have a lot of older segments I still need touse.

Thanks for your help.

-Matt Zytaruk


Andrzej Bialecki wrote:

Matt Zytaruk wrote:
Here you go.

java.lang.ClassCastException: java.util.ArrayList
       at org.apache.nutch.parse.ParseData.write(ParseData.java:122)
       at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:51)
atorg.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:57)atorg.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:168)
       at org.apache.nutch.mapred.MapTask$1.collect(MapTask.java:78)
atorg.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:229)atorg.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:123)
Congratulations! You are the first person to actually use (and sufferfrom) the multiple values in ContentProperties... ;-)
It turns out that ParseData.write() uses its own method for writingout metadata, instead of using ContentProperties.write(). It workswell if you only have single values (then they are stored as Strings),but if there are multiple values they are stored in ArrayLists, whichParseData accesses directly by the virtue of usingmetadata.entrySet().iterator().
The fix is easy: please replace the following lines in ParseData.write():

   out.writeInt(metadata.size());                // write metadata
   Iterator i = metadata.entrySet().iterator();
   while (i.hasNext()) {
     Map.Entry e = (Map.Entry)i.next();
     UTF8.writeString(out, (String)e.getKey());
     UTF8.writeString(out, (String)e.getValue());
   }

with this:

   metadata.write(out);
and the same for reading the metadata field; replace inParseData.readField() this:
   int propertyCount = in.readInt();             // read metadata
   metadata = new ContentProperties();
   for (int i = 0; i < propertyCount; i++) {
     metadata.put(UTF8.readString(in), UTF8.readString(in));
   }

with this:

   metadata = new ContentProperties();
   metadata.readFields(in);
Compile, deploy, test, report ... :-) Please note that this changesthe on-disk segment format, so you won't be able to read the oldsegments with the new code. You may want to bump theParseData.VERSION, and leave this code to handle older versions...

Re: Class Cast exception

Reply via email to