Good morning all (or good afternoon)
I used Lucene many times before, to search text in French Or English. All
worked fine :-)
But now I have a new challenge, I need to use Lucene with Khmer (Khmer is
the Cambodia’s language, it looks like Thai or Indian)
But it doesn’t work, my code is well executed but it found no results, I
give you my code below
I thought UTF-8 is 100% handled by Java and that we have “nothing to do”
My code is working fine when I use English words.
Thanks in advance for your help :-)
Here is my source code :
/***************************************************************************
*************************************/
public class TestKhmer
{
public static void main(String[] args) throws Exception
{
Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.getDirectory("C:\\Folder\\indexLucene",
true);
IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
iwriter.setMaxFieldLength(25000);
Document doc = new Document();
String text = getContents("C:\\Folder\\file.txt");
// this file was saved as UTF-8 format by UltraEdit , when I open it I see
my Khmer charactere
Field field = new Field("text", text,
Field.Store.YES, Field.Index.TOKENIZED);
Field field2 = new Field("filename", "file.txt" ,
Field.Store.YES, Field.Index.TOKENIZED);
doc.add(field);
doc.add(field2);
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
IndexSearcher isearcher = new
IndexSearcher(directory);
String stringToSearch =
getContents("C:\\Folder\\dataToSearch.txt"); // my search string is located
in a text file, this file was saved as UTF-8 format by UltraEdit, when I
open it I see my Khmer charactere
String stringQuery = "text:" + stringToSearch ;
QueryParser queryParser = new QueryParser("text" ,
analyzer);
Query query = queryParser.parse(stringQuery);
Hits hits = isearcher.search(query);
// Iterate through the results:
for (int i = 0; i < hits.length(); i++)
{
Document hitDoc = hits.doc(i);
System.out.println("Result : " +
hitDoc.get("filename"));
}
isearcher.close();
directory.close();
}
private static String getContents(String path) throws Exception
{
String line = null;
StringBuffer sb = new StringBuffer();
BufferedReader br;
try
{
br = new BufferedReader( new
InputStreamReader( new FileInputStream(path), "UTF-8")); // as I told you,
my file is in UTF-8 format
while((line = br.readLine()) != null)
{
sb.append(line + "\n");
}
} catch (Exception e)
{
e.printStackTrace();
}
return sb.toString();
}
}
/***************************************************************************
*************************************/
Best regards
Nicolas
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.410 / Virus Database: 268.17.8/649 - Release Date: 1/23/2007