Re: StopWord elimination pls. HELP

2004-10-18 Thread Morus Walter
Miro Max writes: String cont = rs.getString(x); d.add(Field.Text(cont, cont)); writer.addDocument(d); to get results from a database into lucene index. but when i check println(d) i can see the german stopwords too. how can i eliminate this? Stopwords in an analyzer don't make the

Re: StopWord elimination pls. HELP

2004-10-18 Thread Miro Max
thans for your help --- Morus Walter [EMAIL PROTECTED] schrieb: Miro Max writes: String cont = rs.getString(x); d.add(Field.Text(cont, cont)); writer.addDocument(d); to get results from a database into lucene index. but when i check println(d) i can see the german stopwords

Re: StopWord elimination pls. HELP

2004-10-18 Thread Miro Max
thans for your help --- Morus Walter [EMAIL PROTECTED] schrieb: Miro Max writes: String cont = rs.getString(x); d.add(Field.Text(cont, cont)); writer.addDocument(d); to get results from a database into lucene index. but when i check println(d) i can see the german stopwords

SV: index, reindexing problem

2004-10-18 Thread MATL (Mats Lindberg)
great seems to solve the problem Mats -Oprindelig meddelelse- Fra: Chuck Williams [mailto:[EMAIL PROTECTED] Sendt: 17. oktober 2004 19:21 Til: Lucene Users List Emne: RE: index, reindexing problem I had this same problem a while back. It should be resolved if you move the writer =

Re: Atomicity in Lucene operations

2004-10-18 Thread Yonik Seeley
Hi Nader, I would greatly appreciate it if you could CC me on the docs or the code. Thanks! Yonik --- Nader Henein [EMAIL PROTECTED] wrote: It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I

n-gram indexing for generating spell suggestions

2004-10-18 Thread Aad Nales
Hi All, After having used the suggested algoritms for a few weeks I found that the suggestions were not completely to my liking; 1. small words (1,2 or 3) characters were never corrected. Especially three letter words and abbreviations suffered. 2. often used misspelings in Dutch words between 4

RE: n-gram indexing for generating spell suggestions

2004-10-18 Thread Alexey Lef
You can also store a phonetic key for the term to find sounds-like matches. I use double metaphone algorithm which appears to be English specific. Not sure if there is something out there for Dutch. For the length, I use relative distance cutoff (distance/length) in addition to the absolute

RE: Index and Search Phrase Documents

2004-10-18 Thread Chuck Williams
You haven't provided enough information for anybody to help. Have you added indexed Field's to your document? If not, there is nothing to search. I don't think you are looking for a parameter to the IndexWriter constructor. I expect the advice from Aviran is best. You should read and

Re: Atomicity in Lucene operations

2004-10-18 Thread Roy Shan
Maybe you can contribute it to sandbox? On Mon, 18 Oct 2004 08:31:30 -0700 (PDT), Yonik Seeley [EMAIL PROTECTED] wrote: Hi Nader, I would greatly appreciate it if you could CC me on the docs or the code. Thanks! Yonik --- Nader Henein [EMAIL PROTECTED] wrote: It's pretty

QueryParsing

2004-10-18 Thread Rupinder Singh Mazara
hi all i have a question regarding the QueryParser and Proximity Searches I executed the following piece of code String x = \jakarta apache\~100; QueryParser parser = new QueryParser(FULL_TEXT,new StandardAnalyzer() ); parser.setOperator( QueryParser.DEFAULT_OPERATOR_AND );

sorting on multiple fields

2004-10-18 Thread Angelov, Rossen
Hi, I read the sorting and score ordering - http://www.mail-archive.com/[EMAIL PROTECTED]/msg09775.html thread from the archive and I think, I have a very similar problem but I still don't understand how the sorting is supposed to work if there are multiple fields given to Sort(SortField[])

Re: QueryParsing

2004-10-18 Thread Erik Hatcher
QueryParser does not (currently) support SpanQuery's. PhraseQuery is what you'll always get with double-quoted strings. However, you can customize the behavior and get a SpanQuery instead by subclassing and overriding getPhraseQuery. In fact, this is an example I wrote for Lucene in Action.

RE: Encrypted indexes

2004-10-18 Thread Weir, Michael
Thanks to everyone for their help. I think I will try using symmetric encryption. Two new questions: 1. Is there code available that implements a new type of filesystem directory? 2. Does anyone have any suggestions or warnings? I.e. if Lucene opens all the files in a directory and then randomly

Re: sorting on multiple fields

2004-10-18 Thread Daniel Naber
On Monday 18 October 2004 21:25, Angelov, Rossen wrote: The first one represents date in format mmddMMHHSS and the second one are the article headlines. The headlines are probably tokenized, right? Sorting then won't work, I think the API documentation contains some details about this.

RE: sorting on multiple fields

2004-10-18 Thread Angelov, Rossen
Yes, the headline is represented by regular words separated with spaces. I guess, this can be considered tokenized. I even didn't think this may cause problems. I'll check the API documentation. Is there any workaround for sorting on tokenized fields? Ross -Original Message- From:

Re: sorting on multiple fields

2004-10-18 Thread Daniel Naber
On Monday 18 October 2004 23:39, Angelov, Rossen wrote: Is there any workaround for sorting on tokenized fields? Just save the field a second time under a different name and use Field.Keyword() for that. Then you can use it for sorting, and still use the original field for searching. Regards

Zilverline release candidate 1.0-rc7 available

2004-10-18 Thread Zilverline info
All, I've just released a new candidate (*1.0-rc7*) New features include Highlighting and 'on-the-fly' extraction of archives. Zilverline is a search engine based on lucene that's ready to roll, and can be simply dropped in a Servlet Engine. It runs out of the box, and supports PDF, WORD, HTM,