Good morning all (or good afternoon)
I used Lucene many times before, to search text in French Or English. All worked fine :-) But now I have a new challenge, I need to use Lucene with Khmer (Khmer is the Cambodia’s language, it looks like Thai or Indian) But it doesn’t work, my code is well executed but it found no results, I give you my code below I thought UTF-8 is 100% handled by Java and that we have “nothing to do” My code is working fine when I use English words. Thanks in advance for your help :-) Here is my source code : /*************************************************************************** *************************************/ public class TestKhmer { public static void main(String[] args) throws Exception { Analyzer analyzer = new StandardAnalyzer(); Directory directory = FSDirectory.getDirectory("C:\\Folder\\indexLucene", true); IndexWriter iwriter = new IndexWriter(directory, analyzer, true); iwriter.setMaxFieldLength(25000); Document doc = new Document(); String text = getContents("C:\\Folder\\file.txt"); // this file was saved as UTF-8 format by UltraEdit , when I open it I see my Khmer charactere Field field = new Field("text", text, Field.Store.YES, Field.Index.TOKENIZED); Field field2 = new Field("filename", "file.txt" , Field.Store.YES, Field.Index.TOKENIZED); doc.add(field); doc.add(field2); iwriter.addDocument(doc); iwriter.close(); // Now search the index: IndexSearcher isearcher = new IndexSearcher(directory); String stringToSearch = getContents("C:\\Folder\\dataToSearch.txt"); // my search string is located in a text file, this file was saved as UTF-8 format by UltraEdit, when I open it I see my Khmer charactere String stringQuery = "text:" + stringToSearch ; QueryParser queryParser = new QueryParser("text" , analyzer); Query query = queryParser.parse(stringQuery); Hits hits = isearcher.search(query); // Iterate through the results: for (int i = 0; i < hits.length(); i++) { Document hitDoc = hits.doc(i); System.out.println("Result : " + hitDoc.get("filename")); } isearcher.close(); directory.close(); } private static String getContents(String path) throws Exception { String line = null; StringBuffer sb = new StringBuffer(); BufferedReader br; try { br = new BufferedReader( new InputStreamReader( new FileInputStream(path), "UTF-8")); // as I told you, my file is in UTF-8 format while((line = br.readLine()) != null) { sb.append(line + "\n"); } } catch (Exception e) { e.printStackTrace(); } return sb.toString(); } } /*************************************************************************** *************************************/ Best regards Nicolas -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.410 / Virus Database: 268.17.8/649 - Release Date: 1/23/2007