It's in HTMLParser#private static String sniffCharacterEncoding I'm still wondering where TikaParser gets the character encoding from though? Additionally, this doesn't look like something we check for in our JUnit classes? If we don't then I would like to write some tests to test for this.
I am working on Any23 tests first, so this provides the justification behind my question. Thanks Lewis On Tue, Feb 14, 2012 at 10:00 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi, > > I can't see anywhere within our parser plugins where we detect encoding of > documents. I've also begun looking through the o.a.n.p package but again I > can't see anything. > > Can anyone provide some detail on this please? > > Thank you > > Lewis > > > > -- > *Lewis* > > -- *Lewis*

