The mimetype is not the same thing as the encoding. As Ken pointed out this
is done at the individual parser level

On 14 February 2012 23:51, Markus Jelsma <[email protected]> wrote:

> Hi,
>
> This was indeed an issue until today. The detected type is in the crawl
> datum
> metadata.
>
> https://issues.apache.org/jira/browse/NUTCH-1259
>
> > Hi,
> >
> > I can't see anywhere within our parser plugins where we detect encoding
> of
> > documents. I've also begun looking through the o.a.n.p package but again
> I
> > can't see anything.
> >
> > Can anyone provide some detail on this please?
> >
> > Thank you
> >
> > Lewis
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to