The mimetype is not the same thing as the encoding. As Ken pointed out this is done at the individual parser level
On 14 February 2012 23:51, Markus Jelsma <[email protected]> wrote: > Hi, > > This was indeed an issue until today. The detected type is in the crawl > datum > metadata. > > https://issues.apache.org/jira/browse/NUTCH-1259 > > > Hi, > > > > I can't see anywhere within our parser plugins where we detect encoding > of > > documents. I've also begun looking through the o.a.n.p package but again > I > > can't see anything. > > > > Can anyone provide some detail on this please? > > > > Thank you > > > > Lewis > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

