Hi Farzad,

Hmmm, where to begin... This is a tough question and one that warrants a fair amount of research. I would start by taking a look at the TREC cross-language tracks and the CLEF conference.

I have used Lucene to index/search both the English and Arabic/French/ Spanish/Dutch/etc. documents. In general, you need some way of transforming a source language query into a target language query OR you need some way of automatically translating all your documents to the same language. How you do this is really the matter of research, eh? The most basic approach to the query transformation problem is to use a dictionary to look up the terms from the source and get the target language equivalents.

As for Lucene, you will need an Analyzer that handles Persian (try googling "Persian Lucene Analyzer") you may very well have to write your own. The actual indexing and search tasks are relatively straightforward as Lucene tasks and there a number of good tutorials and books on how to do that.

Good luck,
Grant

On Aug 13, 2007, at 6:30 AM, Farzad Mahdikhani wrote:

 Dear All,

I would like to implement a cross-lingual IR system with support for Persian and English languages for an academic research task. How can I use Lucene for my task? How shall I proceed? what are the requirements?

 Regards,
 Farzad

---------------------------------
Pinpoint customers who are looking for what you sell.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ


Reply via email to