I understand why buffered indexing seems running faster.
It seems that initialization operation takes obvious time and impact the
indexing performance.
I found ram indexing is faster if I run buffered indexing prior to ram
indexing.
So I think the method addDocuments will take more time at first
On Jun 12, 2006, at 2:04 AM, Shivani Sawhney wrote:
Are you saying that there is no out-of-the-box way of doing this...?
Well, there are lots of techniques for all sorts of tricks with
Lucene. What you're basically asking for is an untokenized, indexed
field and a TermQuery to find a
Ok...
I'll explain the problem that I am facing with an example...
I have several fields for the documents that I index; one of the fields is
'title'. Now I have provided the user with a screen to search for documents
with a particular title. Let's assume that the value inputted by the user is
I think you could follow Erik's advice.
You could index you document's title as an un-tokenized field.
Then searcher will treat the title as a whole string.
If you hope when user inputs Lifecycle, this document also be hit, you
should think any other ways to solve your problem.
-Original
Hi,
I would like users to be able to search on both terms and within a date
range. The solutions I have come across so far are:
1. Use the default QueryParser which will use RangeQuery which will expand
into a number of Boolean clauses. It is quite likely that this will run into
the
Hello,
We have an application dealing with historical books. The books have
metadata consisting of event dates, and person names among others.
The FullText, Person and Date indexes were split until we realized that
for a larger number of documents (400K) the combination of the
sequential search
Hi,
I'm trying to retrieve a document from a Hits object and I'm getting
the following exception and stack trace. I have a Hits object named
hits and I'm just trying to retrieve the first document using
Document doc = hits.doc(0);
I can retrieve all other kinds of information for the
Hello,
You might consider using the suggestion at
http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing
We successfully used it to search for wide date ranges, on a relatively large
number of date records.
Using this approach simplifies a lot the query you are suggesting (3).
You can also try using a ConstantScoreRangeQuery in lieu of the plain
RangeQuery.
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ConstantScoreRangeQuery.html
Regards,
Venu
-Original Message-
From: Mile Rosu [mailto:[EMAIL PROTECTED]
Sent: Monday, June 12, 2006 5:20 PM
Hi,
i am having problem in getting the count on distict values of a field. The
reason for getting this value is that, each of all documents in index
belongs to one predefined class and i want to get the number of documents
belonging to each class.
Regards..
I want to index 1 billion documents. what do you think which one (i mean
using fsDir or ramDir) is suitable for indexing these many documents.
On 6/12/06, Flik Shen [EMAIL PROTECTED] wrote:
It means that to pick both high maxBufferedDocs and mergeFator will
improve your indexing performance.
Hi,
I am trying to index a huge collection of documents - several hundreds of
gigabytes.
Needless to say, I'm trying to squeeze every ounce of performance from my
machine, to get this indexing done in a sensible amount of time.
Making use of the fact that my machine has two CPUs was easy: I
Nadav Har'El wrote:
What I couldn't figure out how to use, however, was the abundant memory (2
GB) that this machine has.
I tried playing with IndexWriter.setMaxBufferedDocs(), and noticed that
there is no speed gain after I set it to 1000, at which point the running
Lucene takes up just 70 MB
Michael D. Curtin [EMAIL PROTECTED] wrote on 12/06/2006 03:49:53 PM:
Nadav Har'El wrote:
What I couldn't figure out how to use, however, was the abundant memory
(2
GB) that this machine has.
I tried playing with IndexWriter.setMaxBufferedDocs(), and noticed that
there is no speed gain
Nadav,
Look up one of my onjava.com Lucene articles, where I talk about this. You may
also want to tell Lucene to merge segments on disk less frequently, which is
what mergeFactor does.
Otis
- Original Message
From: Nadav Har'El [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Don't tokenize the title field. Use Index.UN_TOKENIZED when constructing the
title Field.
Otis
- Original Message
From: Shivani Sawhney [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Monday, June 12, 2006 3:58:24 AM
Subject: RE: Asserting that a value must match the entire
Otis Gospodnetic [EMAIL PROTECTED] wrote on 12/06/2006 04:36:45
PM:
Nadav,
Look up one of my onjava.com Lucene articles, where I talk about
this. You may also want to tell Lucene to merge segments on disk
less frequently, which is what mergeFactor does.
Thanks. Can you please point me to
Hi,
I have an index that contains 3 fields: Book Id, Book Title, and Related
Book Ids.
For example:
=
Book Id Book Title Related Book Ids
A0001 Title 1 A0003, A0004
A0002 Title 2
A0003 Title 3 A0001, A0002
A0004 Title
The reason I'm
asking this that I'm still trying to figure out whether having a machine
with huge ram actually helps Lucene, or not.
Thanks,
Nadav.
Memory can help a little at index time, but you will mostly be Disk / IO
bound. How fast can you read your data in, how fast can you write
Hi,
I'm new to lucene. I'm unsure as to how to use the QueryParser to
execute a search, retrieved from an end user request.
For example, if an end user enters the query new your and selects
content from a pull down list, I can easily translate that request
with the QueryParser as the
You're absolutely right, in most cases there would never be a need to
increase the COMMIT_LOCK_TIMEOUT. In fact, if anything, you would
want to decrease it to prevent wait bottlenecks on a system with a heavy
update load. In short, it would be nice to have the option to change it
to suit
Nadav Har'El wrote:
Otis Gospodnetic [EMAIL PROTECTED] wrote on 12/06/2006 04:36:45
PM:
Nadav,
Look up one of my onjava.com Lucene articles, where I talk about
this. You may also want to tell Lucene to merge segments on disk
less frequently, which is what mergeFactor does.
Thanks. Can
Hi,
It seems anaylzers are never get called for UnTokenized fields(Seems no luck
either using PerFieldAnalyzer).
What should I do, If I would like to use analyzer for Untokenized fields.
Lets say for Keyword or Unstored fields.
I basically would like to use lucene Sort functionality on
I'm seeing query throughput of approx. 290 qps with OpenBitSet vs. 270 with
BitSet. I had to reduce the max. HashDocSet size to 2K - 3K (from 10K-20K)
to get optimal tradeoff.
no. docs in index: 730,000
average no. results returned: 40
average response time: 50 msec (15-20 for counting facets)
What you are asking is not possible. The whole purpose of the analyzer
is to tokenize the fields, so if you want them to be tokenized don't use
the Keyword fields. If you want to use both tokenized and untokenized
just create another filed that will be tokenized.
Aviran
You'll need to run two queries. One for the user's query. Then if you
need to get the related books, collect all the related books from the
results, build a second query that will query the BookId field for all
the related books (create an OR query for all the related bookIDs).
Then merge the
Mordo, Aviran (EXP N-NANNATEK) wrote:
What you are asking is not possible. The whole purpose of the analyzer
is to tokenize the fields, so if you want them to be tokenized don't use
the Keyword fields.
Um, KeywordAnalyzer?
Is there a way to break up a single large index into many smaller indexes?
Dennis
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
See below
Hycel Taylor wrote:
Hi,
I'm new to lucene. I'm unsure as to how to use the QueryParser to
execute a search, retrieved from an end user request.
For example, if an end user enters the query new your and selects
content from a pull down list, I can easily translate that request
with
The KeywordAnalyzer does not do anything, it just returns the whole
phrase as a single term, just as if you didn't use an analyzer at all.
Aviran
http://www.aviransplace.com
-Original Message-
From: Steven Rowe [mailto:[EMAIL PROTECTED]
Sent: Monday, June 12, 2006 1:50 PM
To:
See my note about overlapping indexing documents with merging:
http://www.gossamer-threads.com/lists/lucene/java-user/34188?search_string=%2Bkeegan%20%2Baddindexes;#34188
Peter
On 6/12/06, Michael D. Curtin [EMAIL PROTECTED] wrote:
Nadav Har'El wrote:
Otis Gospodnetic [EMAIL PROTECTED]
Thanks for the help:-)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Michael,
The Searcher you used to get the Hits needs to remain open while
accessing the hits. Your stack trace could have been caused by the
Searcher being closed first.
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server
On 6/10/06, Michael Dodson [EMAIL
I'll experiment with both.
Thanks...
-Original Message-
From: mark harwood [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 07, 2006 2:16 AM
To: java-user@lucene.apache.org
Subject: Re: question with spellchecker
I think the problem in your particular example is the
suggestion software
I've been playing around with Lucene for a while now. I'm pretty
comfortable with creating an index and searching against it. Up until
now, I've been using the LuceneIndexAccessor package contributed by Maik
Schreiber and that's working well for me.
Now the next obstacle is to figure out
a billion? Wow! First, I really, really, really doubt you can use a RAMdir
to index a billion documents. I'd be interested in the parameters of your
problem if you could. I'd be especially interested in providing a home for
any of your old hardware, since I bet it beats mine all to hell G.
My approach, which I think is common, is to use Quartz sheduler.
Chris
-
Instant Lucene Search on Any Databases/Applications
http://www.dbsight.net
On 6/12/06, Van Nguyen [EMAIL PROTECTED] wrote:
I've been playing around with Lucene for a while now. I'm pretty
Hi,
Aprrox 50 Million i have processed upto now. I kept maxMergeFactor and
maxBufferedDoc's value 1000. This value i got after several round of test
runs.
Indexing rate for each document in 50 M, is 1 Document per 4.85 ms.
I am only using fsdirectory. Is there any other way to reduce this time??
38 matches
Mail list logo