Error-Tolerant

2007-04-04 Thread Mohsen Saboorian
-- View this message in context: http://www.nabble.com/Error-Tolerant-tf3524057.html#a9831495 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Error Tolerant Query Parser

2007-04-04 Thread Mohsen Saboorian
Sorry for dual posting. I've just inadvertently submit form before writing the body :) Is there any error tolerant query parser ever written for Lucene? What is the way websites use for advanced searching with Lucene? -- View this message in context:

Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Dear all, My problem is a little bit strange. Instead of parsing the content of the document to the indexer. I am adding one by one. Here is a piece of my code : Document doc = new Document(); doc.add(Field.Text(Features, blue); doc.add(Field.Text(Features,beautiful);

Re: Unique City, State results from index based on zip

2007-04-04 Thread Jokin Cuadrado
don't index the city names with the zip codes. indexed text - Stored Value --- 94941 - 94941 Mill Vallley 94114 - 94114 Mill Vallley Mill Vallley - Mill Vallley 29715 - 29715 Fort Mill 29708 - 29708 Fort

Field.lazy setter method?

2007-04-04 Thread jafarim
Hi I wonder why there is not setter method for the lazy member variable in Field class. Does that mean the propoerty is nominal and setting it does not have any effect, or am I missing some point? Any way, is there any way to tell lucene that a field is to be lazy-loaded, from the very beginning

Re: Field.lazy setter method?

2007-04-04 Thread Yonik Seeley
On 4/4/07, jafarim [EMAIL PROTECTED] wrote: Any way, is there any way to tell lucene that a field is to be lazy-loaded, from the very beginning of field construction? No, that data is not stored in the index. Lazy field loading is specified only when retrieving the stored fields of a document,

Re: Field.lazy setter method?

2007-04-04 Thread Grant Ingersoll
Lazy loading is handled through the FieldSelector interface on IndexReader.doc() and some variations. There is nothing special that need be done during indexing to mark a field as lazy. The isLazy method merely lets you know later, after loading a Document, if the field is, indeed, lazy.

Re: Field.lazy setter method?

2007-04-04 Thread jafarim
So, what's the usage of this propoerty in the Field class? On 4/4/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/4/07, jafarim [EMAIL PROTECTED] wrote: Any way, is there any way to tell lucene that a field is to be lazy-loaded, from the very beginning of field construction? No, that data is

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Laxmilal Menaria
hello, you can try this code : IndexReader ISer= IndexReader.open(C:/Testindex); TermEnum te=ISer.terms(new Term(Features,blue)); Term te1= te.term(); System.out.println(Frequency of blue +ISer.docFreq(te1)); regards, -LM On 4/4/07, Sengly Heng [EMAIL

Re: Unique City, State results from index based on zip

2007-04-04 Thread Erick Erickson
The default operator for QueryParser is OR, so what you may really be getting is hits on Mill, and Vally is irrelevant. But this is just a guess, it'd be way more helpful if you told us what your index structure was and what query you actually submitted, for which query.toString is really

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Erick Erickson
See below On 4/4/07, Sengly Heng [EMAIL PROTECTED] wrote: Dear all, My problem is a little bit strange. Instead of parsing the content of the document to the indexer. I am adding one by one. Here is a piece of my code : Document doc = new Document(); doc.add(Field.Text(Features, blue);

Re: IndexWriter Quandry

2007-04-04 Thread Michael McCandless
Kvailis [EMAIL PROTECTED] wrote: I'm pretty new to Lucene (2.0.0) and and having an issue with the IndexWriter: if I set the boolean argument to 'true' it goes ahead and writes indexes that turn out to be perfectly usable; taking the same exact code and swithing the boolean to 'false'

Explanation from FunctionQuery

2007-04-04 Thread Annona Keene
I'm hoping someone can offer some insight into the FunctionQuery. I've just discovered this, and I think it's exactly what I've been looking for, but I'm having some trouble getting it to work. I can create and execute the query, but if I try to see the Explanation, I get an

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Thanks so much for your explaination. But there is one thing that I want to make sure is that in case that i add the same token to the same field, internally is it redundancy? And in case, that I have many fields. What is the best way to list up the frequency of all the tokens from different

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Thank you. But i found that the result is always 1. Even i input the token that I dont even have in the doc. What happened? Best, Sengly On 4/4/07, Laxmilal Menaria [EMAIL PROTECTED] wrote: hello, you can try this code : IndexReader ISer= IndexReader.open(C:/Testindex);

Re: Error Tolerant Query Parser

2007-04-04 Thread Otis Gospodnetic
Hm, error tolerant query parser? How do you want to handle queries with invalid syntax? Here is one way: try { QueryParser qp = new QueryParser(.); Query q = qp.parse(); } catch (Throwable t) { // tolerate any exception } ;) Bad but quite tolerant. Otis . . . . . . . . . . . .

How many Searches is a Searcher Worth?

2007-04-04 Thread Craig W Conway
I am using an RMI architecture for calling a remote service which uses an IndexSearcher in its own JVM. I am starting the service with the following provisions for memory allocation and garbage collection: java -server -Xmx1024m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC After about 1000 search

Re: How many Searches is a Searcher Worth?

2007-04-04 Thread Otis Gospodnetic
No reason that I can think of. What makes you think the problem is with the IndexSearcher? Maybe it's something else in your code, for instance. Make sure you have the same version of Java on both ends of the call. Also, Java 6 made our RMI calls a lot more stable than even 1.5. Otis . . .

Re: Design Problem: Searching large set of protected documents

2007-04-04 Thread Paul Elschot
On Wednesday 04 April 2007 01:32, Erick Erickson wrote: I thought you could simply add a ConstantScoreQuery (whose constructor takes a Filter) to a BooleanQuery. It seems that doing this at the very top level with a MUST would do the trick. I have not tried this myself, but indeed this

Better parsing of Queries

2007-04-04 Thread Simon Wistow
I'm looking for some advice on dealing with malformed queries. If a user searches for yow! then I get an exception from the query parser. I can get round this by using QueryParser.escape(query) first but then that prevents them from searching using other bits of the the query syntax such as

Re: Better parsing of Queries

2007-04-04 Thread Erick Erickson
About all you can do is roll your own. I suspect a decent regular expression would work, or you could let Lucene escape the query and then re-replace all \: with : Erick On 4/4/07, Simon Wistow [EMAIL PROTECTED] wrote: I'm looking for some advice on dealing with malformed queries. If a user

distinct term values?

2007-04-04 Thread Ryan McKinley
Is there an efficient way to know how many distinct terms there are for a given field name? I know I can walk through a TermEnum and put them into a hash, but it would be useful to know beforehand if you are going to get 4 distinct values or 40,000 I don't need to know what the terms are, just

Re: distinct term values?

2007-04-04 Thread Yonik Seeley
On 4/4/07, Ryan McKinley [EMAIL PROTECTED] wrote: Is there an efficient way to know how many distinct terms there are for a given field name? I know I can walk through a TermEnum and put them into a hash No hash needed... just walk through the TermEnum and count. -Yonik

Re: distinct term values?

2007-04-04 Thread Erick Erickson
Sorry if this is a double post, but my last attempt failed.. Not that I know of, but I think you'll be surprised how fast TermEnum will walk the list of terms. I think you misunderstand TermEnum. It will NOT enumerate a term twice, so there's no need for a hash, just a simple increment of

Re: Range search in numeric fields

2007-04-04 Thread Peter W .
Andy, MemoryCachedRangeFilter looks nice, can't wait for it to be included with other goodies in the next 2.x point release! Numeric range search questions come up often for Lucene, best practices probably include working with BitSets directly (which I have been unable to grok), using queries

Re: Benchmarking LUCENE-584 with contrib/benchmark

2007-04-04 Thread Otis Gospodnetic
Hi Doron, Yes, this was great help, thanks! I've got my: 1. MatchTask (just like ReadTask, but with searcher.match(Query, new MatchCollector() )) 2. SearchMatchTask (just like SearchTask, but extends MatchTask), so I was able to use SearchMatch in the alg file where Search was before. I

Re: distinct term values?

2007-04-04 Thread Ryan McKinley
TermEnum works like a charm, no need to optimize (yet). Enjoy the Merlot! On 4/4/07, Erick Erickson [EMAIL PROTECTED] wrote: Sorry if this is a double post, but my last attempt failed.. Not that I know of, but I think you'll be surprised how fast TermEnum will walk the list of terms. I