RE: An interesting thing

2006-06-12 Thread Flik Shen
I understand why buffered indexing seems running faster. It seems that initialization operation takes obvious time and impact the indexing performance. I found ram indexing is faster if I run buffered indexing prior to ram indexing. So I think the method addDocuments will take more time at first

Re: Asserting that a value must match the entire content of a field

2006-06-12 Thread Erik Hatcher
On Jun 12, 2006, at 2:04 AM, Shivani Sawhney wrote: Are you saying that there is no out-of-the-box way of doing this...? Well, there are lots of techniques for all sorts of tricks with Lucene. What you're basically asking for is an untokenized, indexed field and a TermQuery to find a

RE: Asserting that a value must match the entire content of a field

2006-06-12 Thread Shivani Sawhney
Ok... I'll explain the problem that I am facing with an example... I have several fields for the documents that I index; one of the fields is 'title'. Now I have provided the user with a screen to search for documents with a particular title. Let's assume that the value inputted by the user is

RE: Asserting that a value must match the entire content of a field

2006-06-12 Thread Flik Shen
I think you could follow Erik's advice. You could index you document's title as an un-tokenized field. Then searcher will treat the title as a whole string. If you hope when user inputs Lifecycle, this document also be hit, you should think any other ways to solve your problem. -Original

Best solution for the Date Range problem

2006-06-12 Thread Björn Ekengren
Hi, I would like users to be able to search on both terms and within a date range. The solutions I have come across so far are: 1. Use the default QueryParser which will use RangeQuery which will expand into a number of Boolean clauses. It is quite likely that this will run into the

Using more than one index

2006-06-12 Thread Mile Rosu
Hello, We have an application dealing with historical books. The books have metadata consisting of event dates, and person names among others. The FullText, Person and Date indexes were split until we realized that for a larger number of documents (400K) the combination of the sequential search

Retrieving a document from a Hits object

2006-06-12 Thread Michael Dodson
Hi, I'm trying to retrieve a document from a Hits object and I'm getting the following exception and stack trace. I have a Hits object named hits and I'm just trying to retrieve the first document using Document doc = hits.doc(0); I can retrieve all other kinds of information for the

RE: Best solution for the Date Range problem

2006-06-12 Thread Mile Rosu
Hello, You might consider using the suggestion at http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing We successfully used it to search for wide date ranges, on a relatively large number of date records. Using this approach simplifies a lot the query you are suggesting (3).

RE: Best solution for the Date Range problem

2006-06-12 Thread Satuluri, Venu_Madhav
You can also try using a ConstantScoreRangeQuery in lieu of the plain RangeQuery. http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ConstantScoreRangeQuery.html Regards, Venu -Original Message- From: Mile Rosu [mailto:[EMAIL PROTECTED] Sent: Monday, June 12, 2006 5:20 PM

Getting count on distinct values of a field.

2006-06-12 Thread vipin sharma
Hi, i am having problem in getting the count on distict values of a field. The reason for getting this value is that, each of all documents in index belongs to one predefined class and i want to get the number of documents belonging to each class. Regards..

Re: IndexWriter.addIndexes optimizatio

2006-06-12 Thread heritrix . lucene
I want to index 1 billion documents. what do you think which one (i mean using fsDir or ramDir) is suitable for indexing these many documents. On 6/12/06, Flik Shen [EMAIL PROTECTED] wrote: It means that to pick both high maxBufferedDocs and mergeFator will improve your indexing performance.

Does more memory help Lucene?

2006-06-12 Thread Nadav Har'El
Hi, I am trying to index a huge collection of documents - several hundreds of gigabytes. Needless to say, I'm trying to squeeze every ounce of performance from my machine, to get this indexing done in a sensible amount of time. Making use of the fact that my machine has two CPUs was easy: I

Re: Does more memory help Lucene?

2006-06-12 Thread Michael D. Curtin
Nadav Har'El wrote: What I couldn't figure out how to use, however, was the abundant memory (2 GB) that this machine has. I tried playing with IndexWriter.setMaxBufferedDocs(), and noticed that there is no speed gain after I set it to 1000, at which point the running Lucene takes up just 70 MB

Re: Does more memory help Lucene?

2006-06-12 Thread Nadav Har'El
Michael D. Curtin [EMAIL PROTECTED] wrote on 12/06/2006 03:49:53 PM: Nadav Har'El wrote: What I couldn't figure out how to use, however, was the abundant memory (2 GB) that this machine has. I tried playing with IndexWriter.setMaxBufferedDocs(), and noticed that there is no speed gain

Re: Does more memory help Lucene?

2006-06-12 Thread Otis Gospodnetic
Nadav, Look up one of my onjava.com Lucene articles, where I talk about this. You may also want to tell Lucene to merge segments on disk less frequently, which is what mergeFactor does. Otis - Original Message From: Nadav Har'El [EMAIL PROTECTED] To: java-user@lucene.apache.org

Re: Asserting that a value must match the entire content of a field

2006-06-12 Thread Otis Gospodnetic
Don't tokenize the title field. Use Index.UN_TOKENIZED when constructing the title Field. Otis - Original Message From: Shivani Sawhney [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, June 12, 2006 3:58:24 AM Subject: RE: Asserting that a value must match the entire

Re: Does more memory help Lucene?

2006-06-12 Thread Nadav Har'El
Otis Gospodnetic [EMAIL PROTECTED] wrote on 12/06/2006 04:36:45 PM: Nadav, Look up one of my onjava.com Lucene articles, where I talk about this. You may also want to tell Lucene to merge segments on disk less frequently, which is what mergeFactor does. Thanks. Can you please point me to

Related documents ...

2006-06-12 Thread Dragon Fly
Hi, I have an index that contains 3 fields: Book Id, Book Title, and Related Book Ids. For example: = Book Id Book Title Related Book Ids A0001 Title 1 A0003, A0004 A0002 Title 2 A0003 Title 3 A0001, A0002 A0004 Title

Re: Does more memory help Lucene?

2006-06-12 Thread Dan Armbrust
The reason I'm asking this that I'm still trying to figure out whether having a machine with huge ram actually helps Lucene, or not. Thanks, Nadav. Memory can help a little at index time, but you will mostly be Disk / IO bound. How fast can you read your data in, how fast can you write

Handling a end user query from multiple indexes

2006-06-12 Thread Hycel Taylor
Hi, I'm new to lucene. I'm unsure as to how to use the QueryParser to execute a search, retrieved from an end user request. For example, if an end user enters the query new your and selects content from a pull down list, I can easily translate that request with the QueryParser as the

Re: COMMIT_LOCK_TIMEOUT - IndexSearcher/IndexReader

2006-06-12 Thread Michael Duval
You're absolutely right, in most cases there would never be a need to increase the COMMIT_LOCK_TIMEOUT. In fact, if anything, you would want to decrease it to prevent wait bottlenecks on a system with a heavy update load. In short, it would be nice to have the option to change it to suit

Re: Does more memory help Lucene?

2006-06-12 Thread Michael D. Curtin
Nadav Har'El wrote: Otis Gospodnetic [EMAIL PROTECTED] wrote on 12/06/2006 04:36:45 PM: Nadav, Look up one of my onjava.com Lucene articles, where I talk about this. You may also want to tell Lucene to merge segments on disk less frequently, which is what mergeFactor does. Thanks. Can

How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Ramana Jelda
Hi, It seems anaylzers are never get called for UnTokenized fields(Seems no luck either using PerFieldAnalyzer). What should I do, If I would like to use analyzer for Untokenized fields. Lets say for Keyword or Unstored fields. I basically would like to use lucene Sort functionality on

Re: Aggregating category hits

2006-06-12 Thread Peter Keegan
I'm seeing query throughput of approx. 290 qps with OpenBitSet vs. 270 with BitSet. I had to reduce the max. HashDocSet size to 2K - 3K (from 10K-20K) to get optimal tradeoff. no. docs in index: 730,000 average no. results returned: 40 average response time: 50 msec (15-20 for counting facets)

RE: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Mordo, Aviran (EXP N-NANNATEK)
What you are asking is not possible. The whole purpose of the analyzer is to tokenize the fields, so if you want them to be tokenized don't use the Keyword fields. If you want to use both tokenized and untokenized just create another filed that will be tokenized. Aviran

RE: Related documents ...

2006-06-12 Thread Mordo, Aviran (EXP N-NANNATEK)
You'll need to run two queries. One for the user's query. Then if you need to get the related books, collect all the related books from the results, build a second query that will query the BookId field for all the related books (create an OR query for all the related bookIDs). Then merge the

Re: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Steven Rowe
Mordo, Aviran (EXP N-NANNATEK) wrote: What you are asking is not possible. The whole purpose of the analyzer is to tokenize the fields, so if you want them to be tokenized don't use the Keyword fields. Um, KeywordAnalyzer?

Breaking up an index

2006-06-12 Thread Dennis Kubes
Is there a way to break up a single large index into many smaller indexes? Dennis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Handling a end user query from multiple indexes

2006-06-12 Thread Grant Ingersoll
See below Hycel Taylor wrote: Hi, I'm new to lucene. I'm unsure as to how to use the QueryParser to execute a search, retrieved from an end user request. For example, if an end user enters the query new your and selects content from a pull down list, I can easily translate that request with

RE: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-12 Thread Mordo, Aviran (EXP N-NANNATEK)
The KeywordAnalyzer does not do anything, it just returns the whole phrase as a single term, just as if you didn't use an analyzer at all. Aviran http://www.aviransplace.com -Original Message- From: Steven Rowe [mailto:[EMAIL PROTECTED] Sent: Monday, June 12, 2006 1:50 PM To:

Re: Does more memory help Lucene?

2006-06-12 Thread Peter Keegan
See my note about overlapping indexing documents with merging: http://www.gossamer-threads.com/lists/lucene/java-user/34188?search_string=%2Bkeegan%20%2Baddindexes;#34188 Peter On 6/12/06, Michael D. Curtin [EMAIL PROTECTED] wrote: Nadav Har'El wrote: Otis Gospodnetic [EMAIL PROTECTED]

Re: Handling a end user query from multiple indexes

2006-06-12 Thread Hycel Taylor
Thanks for the help:-) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Retrieving a document from a Hits object

2006-06-12 Thread Yonik Seeley
Hi Michael, The Searcher you used to get the Hits needs to remain open while accessing the hits. Your stack trace could have been caused by the Searcher being closed first. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server On 6/10/06, Michael Dodson [EMAIL

RE: question with spellchecker

2006-06-12 Thread Van Nguyen
I'll experiment with both. Thanks... -Original Message- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 07, 2006 2:16 AM To: java-user@lucene.apache.org Subject: Re: question with spellchecker I think the problem in your particular example is the suggestion software

updating index - web application

2006-06-12 Thread Van Nguyen
I've been playing around with Lucene for a while now. I'm pretty comfortable with creating an index and searching against it. Up until now, I've been using the LuceneIndexAccessor package contributed by Maik Schreiber and that's working well for me. Now the next obstacle is to figure out

Re: IndexWriter.addIndexes optimizatio

2006-06-12 Thread Erick Erickson
a billion? Wow! First, I really, really, really doubt you can use a RAMdir to index a billion documents. I'd be interested in the parameters of your problem if you could. I'd be especially interested in providing a home for any of your old hardware, since I bet it beats mine all to hell G.

Re: updating index - web application

2006-06-12 Thread Chris Lu
My approach, which I think is common, is to use Quartz sheduler. Chris - Instant Lucene Search on Any Databases/Applications http://www.dbsight.net On 6/12/06, Van Nguyen [EMAIL PROTECTED] wrote: I've been playing around with Lucene for a while now. I'm pretty

Re: IndexWriter.addIndexes optimizatio

2006-06-12 Thread heritrix . lucene
Hi, Aprrox 50 Million i have processed upto now. I kept maxMergeFactor and maxBufferedDoc's value 1000. This value i got after several round of test runs. Indexing rate for each document in 50 M, is 1 Document per 4.85 ms. I am only using fsdirectory. Is there any other way to reduce this time??