Hi Maurits With the language guesser it doesn't matter whether they are in one index or language specific indexes, more how you want to organise your data. Even if you have separate language dictionaries, I think that it would be best to have a language field - holding the guessed language of the document.
An alternative would be language tagging, where you embed language tags into the document and in this way can correctly handle documents that comprise more than one language - but unfortunately I don't think that there are any opensource language taggers. Cheers Pete ----- Original Message ----- From: "maurits van wijland" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Saturday, November 08, 2003 7:30 AM Subject: Re: Java TextCat 0.1 > Pete, > > It's because I think of search engine as a guided search engine. They should > offer > the 'end-user' help when trying to find information. So a drop-down should > not > be included into the search interface. > > Ofcourse a drop down is a good method to choose a query language. Are the > different > languages in different indexes or are they all combined into one? > > chrs, > > Maurits > > ----- Original Message ----- > From: "Pete Lewis" <[EMAIL PROTECTED]> > To: "Lucene Developers List" <[EMAIL PROTECTED]> > Sent: Friday, November 07, 2003 8:58 PM > Subject: Re: Java TextCat 0.1 > > > > Hi Maurits > > > > Language guessing is OK for documents where you have a fair amount of text > > to play with; search clues however are much shorter - often just a word or > > two. Therefore why don't you have a default query language and then just > > have a drop-down box to let the user select the query language if > different > > from the default. > > > > Cheers > > > > Pete > > > > ----- Original Message ----- > > From: "maurits van wijland" <[EMAIL PROTECTED]> > > To: "Lucene Developers List" <[EMAIL PROTECTED]> > > Sent: Friday, November 07, 2003 7:12 PM > > Subject: Re: Java TextCat 0.1 > > > > > > > Hi all, > > > > > > Incze, do you choose the analyer when indexing and seraching? how? > > > Can you send an example code? > > > > > > I have tried this with a naive bayes language guesser, but the problem i > > > found is that whren searching, the query words are to 'small' to > > accurately > > > predict a language... > > > > > > So, how do you manage? > > > > > > kind regards, > > > > > > Maurits van Wijland > > > > > > > > > ----- Original Message ----- > > > From: "Incze Lajos" <[EMAIL PROTECTED]> > > > To: "Lucene Developers List" <[EMAIL PROTECTED]> > > > Sent: Friday, November 07, 2003 2:31 AM > > > Subject: Re: Java TextCat 0.1 > > > > > > > > > > On Thu, Nov 06, 2003 at 02:14:11PM +0100, Patrick Debois wrote: > > > > > Java interfacing with libtextcat. Might be of interest for you > > > (according > > > > > to the mailling lists) > > > > > > > > > > I've used it for choosing the correct analyzer in Lucene Snowball > > > > > > > > > > I will provide it on my website > http://www.jedi.be/JTextCat/index.html > > > > > > > > > > Hope it does not violate any copyrights. > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > Have you seen this project? > > > > > > > > http://ngramj.sourceforge.net/ > > > > > > > > (Pure java N-Gram lib, with a sample servlet.) > > > > > > > > incze > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
