Also we fall back to windows-1252 encoding in the
parser.character.encoding.default property when we can't find anything else.

On Tue, Feb 14, 2012 at 10:34 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> It's in HTMLParser#private static String sniffCharacterEncoding
>
> I'm still wondering where TikaParser gets the character encoding from
> though? Additionally, this doesn't look like something we check for in our
> JUnit classes? If we don't then I would like to write some tests to test
> for this.
>
> I am working on Any23 tests first, so this provides the justification
> behind my question.
>
> Thanks
>
> Lewis
>
>
> On Tue, Feb 14, 2012 at 10:00 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> Hi,
>>
>> I can't see anywhere within our parser plugins where we detect encoding
>> of documents. I've also begun looking through the o.a.n.p package but again
>> I can't see anything.
>>
>> Can anyone provide some detail on this please?
>>
>> Thank you
>>
>> Lewis
>>
>>
>>
>> --
>> *Lewis*
>>
>>
>
>
> --
> *Lewis*
>
>


-- 
*Lewis*

Reply via email to