MediaTypeRegistry normalize query

Tom Barber Sun, 07 Sep 2014 13:29:13 -0700

Hey guys

I was doing some stuff related to MimeTypes.getRegisteredMimeType andwithin that method it calls


registry.normalize(type)

now when parsing HTML files these days Tika adds the charset attributeto the string.

I would have thought the normalize call was designed to remove thisbecause tika-mimetypes.xml surely isn't supposed to contain charsetmatching tags?


Anyway if you do

Tika.detect(myurl)

followed by

MimeTypes.getRegisteredMimeType("text/html; charset=UTF-8");

It returns null because it doesn't strip the charset, without it its fine.

Bug/Feature/Misunderstanding?

Regards

Tom
--
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

MediaTypeRegistry normalize query

Reply via email to