On Monday 21 November 2005 02:16, [EMAIL PROTECTED] wrote:
Hi. I was wondering if anyone else has seen this
before. I'm using lucene 1.4.3 and have indexed
about 3000 text documents using the statement:
doc.add(Field.Text(contents, new FileReader(f),
true));
When I go and retrieve the
By default, documents get truncated at 10,000 terms (maybe there is
an off-by-one where it is going to 10,001 though?).
To increase this, and I always do, set the max field length on your
IndexWriter, and re-index. In 1.4.3, you set the maxFieldLength
variable of IndexWriter directly.
Yonik,
Thanks for your carefully thought out and detailed reply.
On 20 Nov 2005, at 12:00, Yonik Seeley wrote:
Does it make sense to add an IndexWriter setting to
specify a default position increment gap to use when multiple fields
are added in this way?
Per-field might be nice...
The good
On 21 Nov 2005, at 04:26, Erik Hatcher wrote:
What about adding an offset to Field, setPositionOffset(int
offset)? Looking at DocumentWriter, it looks like this would be
the simplest thing that could work, without precluding the
interesting option of modifying Analyzer to allow with flags
Hi,
I am using lucene 1.4.3. The basic functionality of the search is
simple, put in the keyword as java and it will display you all the books
having java keyword.
Now I have to add a feature which also shows the name of top authors (lets
say top 5 authors) with the number of
Just to make sure that I understand this correctly,
the docs say:
By default, no more than 10,000 terms will be
indexed for a field.
Given your note, then the docs do not mean that no
more than 10,000 terms will be indexed, but that some
smaller number of terms will be indexed and only the
When I go and retrieve the term frequency vectors, for
any document under about 90k, everything looks as
expected. However for larger documents (I haven't
found the exact point, but I know that those over 128k
qualify) the sum of the term frequencies in the vector
seems to max out at 10001.
On 21 Nov 2005, at 08:37, Michael Curtin wrote:
That's probably because there is a limit built into Lucene where it
ignores any tokens in a field past the first 10,000. There is a
property you can set to increase this limit. I dont' have the
source in front of me right now, but if you go
Hi,
I tried stressing Lucene in a controlled environment: one static
IndexSearcher for an index that doesn't change, and in same process I create
a number of Threads that call this Searcher concurrently for a limited time.
I expected the number of successful queries to increase when using more
To get a higher limit. Of course, you could also change the Lucene source
file and recompile it. Note that you CANNOT just set the property in your
code, in general, as the Lucene class puts it into a static final int,
meaning it examines the value of the property (once) at class
Oren Shir wrote:
I tested this in version 1.4.3 and 1.9rc1, and they are both the same in
this aspect. 1.9rc1 is faster, but does not benefit from multi threading.
some newbie questions i have,
does 1.4.3 benefit from multi-threading?
is 1.9 the version in the source repository?
_gk
This is expected behavior: you are probably quickly becoming CPU bound
(which isn't a bad thing). More threads only help when some threads
are waiting on IO, or if you actually have a lot of CPUs in the box.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/21/05, Oren Shir [EMAIL
gekkokid,
does 1.4.3 benefit from multi-threading?
Sorry for not being clear. My tests show that both version does not benefit
from multi threading, but it is possible that I'm CPU bound, as Yonik kindly
reminded me.
is 1.9 the version in the source repository?
1.9 is the version in source
On 11/21/05, Oren Shir [EMAIL PROTECTED] wrote:
It is rather sad if 10 threads reach the CPU limit. I'll check it and get
back to you.
It's about performance and throughput though, not about number of
threads it takes to reach saturation.
In a 2 CPU box, I would say that the ideal situation is
Hi, Karl,
Therer have been quite some discussions regarding the too many open files
problem. From my understanding, it is due to Lucene trying to open multiple
segments at the same time (during search/merging segments), and the
operating system wouldn't allow opening that many file handles.
If
On 11/21/05, Erik Hatcher [EMAIL PROTECTED] wrote:
Modifying Analyzer as you have suggested would
require DocumentWriter additionally keep track of the field names
and note when one is used again.
For position increments, it doesn't have to be tracked. The patch to
DocumentWriter could also
On 21 Nov 2005, at 12:55, Yonik Seeley wrote:
On 11/21/05, Erik Hatcher [EMAIL PROTECTED] wrote:
Modifying Analyzer as you have suggested would
require DocumentWriter additionally keep track of the field names
and note when one is used again.
For position increments, it doesn't have to be
: By default, no more than 10,000 terms will be
: indexed for a field.
:
: Given your note, then the docs do not mean that no
: more than 10,000 terms will be indexed, but that some
: smaller number of terms will be indexed and only the
: first 10,000 occurrances will be tallied.
It means that
Hi all,
Is there an index changed event that I can jump on that will
tell me when my index has been updated so I can close and reopen my
searcher to get the new changes?
I can't seem to find the event, but see some tools that might
accomplish this (DLESE DPC software components?).
If I sort on a field called sequence, but at document creation time I add in
//create doc A
doc.add(Field.Text(sequence, 32));
doc.add(Field.Text(sequence, 3));
doc.add(Field.Text(sequence, 932));
//create doc B
doc.add(Field.Text(sequence, 1));
doc.add(Field.Text(sequence, 300));
On 21 Nov 2005, at 16:09, Yonik Seeley wrote:
The Analyzer extensions seem fine, but much more general purpose
than my need.
For your need (a global increment), isn't expanding analyzer
actually easier?
analyser = new OldAnalyzer() {
public int getPositionIncrementGap(String field) {
On 11/21/05, Erik Hatcher [EMAIL PROTECTED] wrote:
Neither. It'll throw an exception.
Just don't rely on it to throw an exception either though... the
checking is not comprehensive.
One should treat sorting on a field with more than one value per
document as undefined.
-Yonik
Now hiring --
22 matches
Mail list logo