Hi, This was indeed an issue until today. The detected type is in the crawl datum metadata.
https://issues.apache.org/jira/browse/NUTCH-1259 > Hi, > > I can't see anywhere within our parser plugins where we detect encoding of > documents. I've also begun looking through the o.a.n.p package but again I > can't see anything. > > Can anyone provide some detail on this please? > > Thank you > > Lewis

