Jukka Zitting
Fri, 29 Jan 2010 04:16:49 -0800
Hi,
Interesting graph from Google about the relative usage of different
character encodings:
http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html
It's interesting to see that the Unicode entry only lists the UTF-8
encoding. Are the other Unicode encodings so infrequent?
I think we can use this data as a guideline when optimizing the
encoding detection code in Tika.
BR,
Jukka Zitting