Hi ! I have installed the Extension:FileIndexer new variant (http://www.mediawiki.org/wiki/Extension_talk:FileIndexer#New_Variant) from Ramon Dohle (raZe) on my version 1.12 and it works well for english text. When I upload a PDF file containing french accented characters such as e-acute ("é"), those are wrongly indexed and show on the file upload page.
I've looked inside the wiki database (table wikiprefix_searchindex, column si_text) and found that an e-acute is represented as the string "u8c3a9" for any standard page while it is represented by "u8efbfbd" for the uploaded PDF entry. Actually any accented character is represented by "u8efbfbd" ! Of course searching doesn't work with such caracter substitution. "u8c3a9" is actually the code for UTF-8. I'm not sure about "u8efbfbd" but it seems is it a kind of placer holder. Any advice appreciated. -- [email protected] Author of ICS (Internet Component Suite, freeware) Author of MidWare (Multi-tier framework, freeware) http://www.overbyte.be _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
