Re: Proximity Search between phrases

2008-12-29 Thread Cool The Breezer
You could you phrase queries also like Economic Meltdown AND Asian Countries. but these phrases may be too distant from one another to be relevant for your searching purposes. To get better result wrt position(distance between phrases), you can use SpanNearQuery. Let me know if you need more

Re: Re: Proximity Search between phrases

2008-12-29 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: Re: Proximity Search between phrases

2008-12-29 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

duplication checking while indexing

2008-12-29 Thread Chris Lu
I am wondering whether there is an easy way to avoid duplication while indexing, just using the index being created, without creating other data structures. In some cases, the incoming document list can have duplicates. For example, when creating spell checking indexes for phrases. Each phrase is

Help in Arabic Analysers with Lucene on Windows

2008-12-29 Thread Girish Naik
Hi, I am having a hard time in indexing the Arabic content and searching the same via Lucene. I have also used a Arabic Analyzer from the Lucene package but had no luck. I have also used a snowball jar but it doesn't contain an Arabic stemmer. So i had put the Lucene Arabic Stemmer in

Re: Help in Arabic Analysers with Lucene on Windows

2008-12-29 Thread Grant Ingersoll
Hi Girish, Can you provide some sample code and info about what isn't working? All you have said so far is that the Arabic Analyzer doesn't work for you, but you have said nothing about how you are actually using it. Are you getting exceptions? Do the tokens not look right? Are no

Re: Help in Arabic Analysers with Lucene on Windows

2008-12-29 Thread Girish Naik
Sorry for that, Here is how the Analyzer is Selected: public static Analyzer getAnalyzerInstance(String localeKey) { Analyzer analyzer = null; if (localeKey == null || localeKey.trim().equals("")) { localeKey = AppContext.getSetting("defaultLocale"); System.out.println("Locale key taken

Re: Re: Re: Payloads

2008-12-29 Thread Greg Shackles
That sounds pretty cool Karl, and I also dig your use of Motorhead as an example : ) I recently built an application where payloads were a lifesaver, but my usage of them is pretty basic. I am indexing pages of text, so I use payloads to store metadata about each word on the page - size, color,

Re: Re: Re: Payloads

2008-12-29 Thread Greg Shackles
That sounds pretty cool Karl, and I also dig your use of Motorhead as an example : ) I recently built an application where payloads were a lifesaver, but my usage of them is pretty basic. I am indexing pages of text, so I use payloads to store metadata about each word on the page - size, color,

Where to get login details for Luke

2008-12-29 Thread NageswaraRao M
Hi Guys, Can you Please tell me where to get login details for Luke Thanks Nagesh

Re: Payloads

2008-12-29 Thread Peter Keegan
Hi Karl, I use payloads for weight only, too, with BoostingTermQuery (see: http://www.nabble.com/BoostingTermQuery-scoring-td20323615.html#a20323615) A custom tokenizer looks for the reserved character '\b' followed by a 2 byte 'boost' value. It then creates a special Token type for a custom

Re: Help in Arabic Analysers with Lucene on Windows

2008-12-29 Thread Grant Ingersoll
On Dec 29, 2008, at 9:59 AM, Girish Naik wrote: FIELD_BODY is defined as public static final String FIELD_BODY = AVS_FIELD_BODY; and its indexed as ParsedDoc webdoc = ParsedDoc.getDoc(page); ... document.add(new Field(Constants.FIELD_BODY, webdoc.getContents(), Field.Store.NO,

Re: Where to get login details for Luke

2008-12-29 Thread Erick Erickson
Ummm, I don't understand the question. You don't need to login, Luke is a stand-alone program for examining Lucene indexes. You *do* have to point Luke at your index, there should be some choice about opening a file. I don't have Luke in front of me here at home, but poke around the menus and it

Re: Where to get login details for Luke

2008-12-29 Thread Aaron Schon
It is just File | Open Lucene Index :) - Original Message From: Erick Erickson erickerick...@gmail.com To: java-user@lucene.apache.org Sent: Monday, December 29, 2008 11:05:01 AM Subject: Re: Where to get login details for Luke Ummm, I don't understand the question. You don't need to

Re: Help in Arabic Analysers with Lucene on Windows

2008-12-29 Thread Girish Naik
Thanks Grant  I will check this out. BTW, as far as Lucene version is concerned I had checked out the svn of lucene and created a build its version says as 2.9 :)  . And Luke is of version 0.9.1 Regards, Please do not print this email unless it is absolutely necessary.

Re: duplication checking while indexing

2008-12-29 Thread Otis Gospodnetic
Chris, Mark Miller Co. are working on (Near) Duplicate Detection. I think the work is in Solr's JIRA, but some of it might be applicable to Lucene. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Chris Lu chris...@gmail.com To:

Re: Help in Arabic Analysers with Lucene on Windows

2008-12-29 Thread Grant Ingersoll
On Dec 29, 2008, at 11:25 AM, Girish Naik wrote: Thanks Grant I will check this out. BTW, as far as Lucene version is concerned I had checked out the svn of lucene and created a build its version says as 2.9 :) . And Luke is of version 0.9.1 You will need to plug in your own Lucene

Re: duplication checking while indexing

2008-12-29 Thread Chris Lu
Otis, thanks for the pointer. I think the question can be: How to access TermEnum or TermInfos during indexing. If this is possible, things would be easier. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo:

Re: duplication checking while indexing

2008-12-29 Thread liu Ivan
I use JDBM store document's key ID. 2008/12/30 Chris Lu chris...@gmail.com Otis, thanks for the pointer. I think the question can be: How to access TermEnum or TermInfos during indexing. If this is possible, things would be easier. -- Chris Lu - Instant

IndexCommit#getFileNames() returning duplicates?

2008-12-29 Thread Shalin Shekhar Mangar
Hello, Solr uses IndexCommit#getFileNames() to get a list of files for replication. One windows user reported an exception which looks like it may have been caused by IndexCommit#getFileNames() returning duplicate file names. The exception in his case was caused by _21e.tvx coming more than once.

Re: IndexCommit#getFileNames() returning duplicates?

2008-12-29 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: IndexCommit#getFileNames() returning duplicates?

2008-12-29 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Field Not Present In Document

2008-12-29 Thread Amin Mohammed-Coleman
Hi Thanks for your reply. It turns out you were correct and I was not loading the correct document. User error! Cheers Amin On 28 Dec 2008, at 19:50, Grant Ingersoll gsing...@apache.org wrote: How do you know that document in question has an id of 1, as in when you do: Document