Was there any later thread on the QueryParser supporting open ended
range queries after this:
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg07973.html
Just curious. I plan on overriding the current getRangeQuery() anyway
since it currently doesn't run the endpoints through the
On Apr 5, 2005 3:43 PM, Erik Hatcher [EMAIL PROTECTED] wrote:
On Apr 5, 2005, at 2:49 PM, Yonik Seeley wrote:
Just curious. I plan on overriding the current getRangeQuery() anyway
since it currently doesn't run the endpoints through the analyzer.
What will you do when multiple tokens
I haven't tried it, but I think the fix should be easy... never throw
that exception. Either check for null before the loop, or in the
loop.
Original code for native int sorting:
TermEnum termEnum = reader.terms (new Term (field, ));
try {
if (termEnum.term() == null)
An IndexReader is required to, given a term, find the document number to
mark deleted.
Yeah, most the time it makes sense to do deletions off the
IndexReader. There are times, however, when it would be nice for
deletes to be able to be concurrent with adds.
Q: can docids change after an add()
Term.field is interned, so equals() isn't needed.
-Yonik
On 4/26/05, Peter Veentjer - Anchor Men [EMAIL PROTECTED] wrote:
[...]
Term other = (Term) o;
return field.equals(other.field)
text.equals(other.text);
}
Third: if the field values of refer to
I don't think at this point anything structural has been proposed as
different between 1.9 and 2.0.
Are any of Paul Elschot's query and scorer changes being considered for 2.0?
-Yonik
-
To unsubscribe, e-mail: [EMAIL
I can't say whats actually ready, but I am very interested in sparse
filter representations. I'm working on a project that needs to
dynamic categorization of search results, and this requires caching
thousands of filters.
http://issues.apache.org/bugzilla/show_bug.cgi?id=32965
Once an IndexReader is opened on an index, it's view of that index
never changes. Reuse the same IndexReader for all query requests and
ony reopen it after you do your optimize.
-Yonik
-
To unsubscribe, e-mail: [EMAIL
When created, an IndexReader opens all the segment files and hangs
onto them. Any updates to the index through an IndexWriter (including
commit and optimize) will not affect already open IndexReaders.
-Yonik
On 5/11/05, Naomi Dushay [EMAIL PROTECTED] wrote:
It's my impression that with optimize
Why do we keep the lookup array around?
The actual field value is needed to sort results from multiple
searchers (multisearcher).
-Yonik
On 6/1/05, John Wang [EMAIL PROTECTED] wrote:
Hi:
In the current Lucene sorting implementation, FieldCache is used to
retrieve 2 arrays, the lookup
I use ConstantScoreRangeQuery for this purpose:
http://issues.apache.org/bugzilla/show_bug.cgi?id=34673
-Yonik
On 7/12/05, Rifflard Mickaël [EMAIL PROTECTED] wrote:
Hi all,
I'm using Lucene as a fulltext search engine since a year now and this one
works well for this.
Now, I want to add
If all segments were flushed to the disk (no adds since the last time
the index writer was opened), then it seems like the index should be
fine.
The big question I have is what happens when there are in-memory
segments in the case of an OOM exception during an optimize? Is data
loss possible?
I think it would be 2 billion. There are many places that woudn't
like the overflow to negative docids I think...
We have indexes up to 200M documents, so 1/10th the max.
64 bit ids are definitely something to think about for the near future.
Who's got Lucene indexes nearing the maximum
I can verify that bad things are going on with backslashes and the
query parser in lucene 1.4.3
foo:hi\\ == foo:hi\
(foo:hi\\) == exception
foo:hi\\ == foo:hi\\
foo:hi\\^3 == foo:hi\^3
foo:hi \\ there == foo:hi \\ there
foo:'hi there' == foo:'hi
foo:\ == exception
foo:hi\ == foo:hi
So there
Does anyone have solutions for handling intraword delimiters (case
changes, non-alphanumeric chars, and alpha-numeric transitions)?
If the source text is Wi-Fi, we want to be able to match the following
user queries:
wi fi
wifi
wi-fi
wi+fi
WiFi
One way is to index wi, fi, and wifi.
However,
That was the plan, but step (4) really seems problematic.
- term expansion this way can lead to a lot of false matches
- phrase queries with many bordering words break
- settingt term positions such that phrase queries work on all combos
of subwords is non-trivial.
It seems like a better
It's the QueryParser, not the Analyzer.
When the query parser sees multiple tokens from what looks like a
single word, it puts them in a phrase query.
I think the only way to change that behavior would be to modify the
QueryParser.
-Yonik
On 8/23/05, Dan Armbrust [EMAIL PROTECTED] wrote:
I
The Hits object retrieves the documents lazily, so just ask it for the first
100.
-Yonik
On 9/7/05, haipeng du [EMAIL PROTECTED] wrote:
The reason that I want to limit returned result is that I do not want
to get out of memory problem. I index lucene with 3 million documents.
Sometimes,
You could create your own HitCollector that checked a flag on each hit, and
throw an exception if it was set.
In a separate thread, you could set the flag to cancel the search.
-Yonik
Now hiring -- http://tinyurl.com/7m67g
On 9/8/05, Kunemann Frank [EMAIL PROTECTED] wrote:
The problem is
You could create your own HitCollector that checked a flag on each hit, and
throw an exception if it was set.
In a separate thread, you could set the flag to cancel the search.
-Yonik
Now hiring -- http://tinyurl.com/7m67g
On 9/8/05, Kunemann Frank [EMAIL PROTECTED] wrote:
The problem is
The Hits class collects the document ids from the query in batches. If you
iterate beyond what was collected, the query is re-executed to collect more
ids.
You can use the expert level search methods on IndexSearcher if this isn't
what you want.
-Yonik
On 9/8/05, Richard Krenek [EMAIL
.
On 9/8/05, Yonik Seeley [EMAIL PROTECTED] wrote:
The Hits class collects the document ids from the query in batches. If
you
iterate beyond what was collected, the query is re-executed to collect
more
ids.
You can use the expert level search methods on IndexSearcher if this
isn't
Nope. The IndexReader simply sets a bit in a separate bitvector that marks
the doc as deleted. All info associated with the document are removed after
an IndexWriter merges the segment containing that doc with another (optimize
will merge all segments and hence remove remnants of all deleted
://tinyurl.com/7m67g
On 9/10/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Ok...
but can i search in documents which are marked for deletion?
Bye
--- Ursprüngliche Nachricht ---
Von: Yonik Seeley [EMAIL PROTECTED]
An: java-user@lucene.apache.org
Betreff: Re: IndexReader delete doc! delete
I'm trying to figure out why idf is multiplied twice into the score of a
term query.
It sort of makes sense if you have just one term... the original weight is
idf*boost, and
the normalization factor is 1/(idf*boost), so you multiply in the idf again
if you want the final score to contain an
I just updated a bug via JIRA,
http://issues.apache.org/jira/browse/LUCENE-383
and I didn't see it come to any mailing list like it used to with bugzilla.
Should it have? Is there a new mailing list to sign up for?
-Yonik
Now hiring -- http://tinyurl.com/7m67g
You don't get the boost back directly... it's folded into the norm for the
field and does affect the score when you search against the index.
-Yonik
On 9/21/05, Steve Gaunt [EMAIL PROTECTED] wrote:
Hi all,
I was hoping someone could shed some light on this?
When I set a boost for a
I think your best bet for supporting Java 1.3 would be sticking with Lucene
1.4.
One of the new classes that I am using is the ConstantScoreQuery. I am not
sure if this is going to be included in Lucene 1.9 or not but this does
make use of Java 1.4.
w.r.t. java.util.BitSet, it's a pain, and I
Field length isn't stored... It gets folded into the norm (see
Similarity.lengthNorm) along with the boost and indexing time.
A couple of approaches:
a) index the field twice with two different Similarity implementations
b) store term vectors, derive the length from them and store in the
See IndexWriter.setMaxFieldLength()
-Yonik
Now hiring -- http://tinyurl.com/7m67g
On 10/3/05, Tricia Williams [EMAIL PROTECTED] wrote:
To follow up on my post from Thursday. I have written a very basic test
for TermPositions. This test allows me to identify that only the
first 10001 tokens
ids can also change as the result of an add(), not just optimize(). An add
can trigger a segment merge which can squeeze out deleted docs and thus
change the ids. I think everything else you said is pretty much correct.
On 10/6/05, Jack McBane [EMAIL PROTECTED] wrote:
I know that in general if
I'm not sure that looks like a safe patch.
Synchronization does more than help prevent races... it also introduces
memory barriers.
Removing synchronization to objects that can change is very tricky business
(witness the double-checked locking antipattern).
-Yonik
Now hiring --
, Yonik Seeley [EMAIL PROTECTED] wrote:
We've been using this in production for a while and it fixed the
extremely slow searches when there are deleted documents.
Who was the caller of isDeleted()? There may be an opportunity for an
easy
optimization to grab the BitVector and reuse
Here's the patch:
http://issues.apache.org/jira/browse/LUCENE-454
It resulted in quite a performance boost indeed!
On 10/12/05, Yonik Seeley [EMAIL PROTECTED] wrote:
Thanks for the trace Peter, and great catch!
It certainly does look like avoiding the construction of the docMap
It can...
By the time the hitcollector is called, the documents are already scored, so
you don't save any time there. But since they haven't been sorted yet, you
do save the time it would take to put all the hits through the priority
queue to find the top n.
-Yonik
On 10/21/05, Volodymyr
1) make sure the failure was due to an OutOfMemory exception and not
something else.
2) if you have enough memory, increase the max JVM heap size (-Xmx)
3) if you don't need more than 1.5G or so of heap, use the 32 bit JVM
instead (depending on architecture, it can acutally be a little faster
The closest thing to that is
http://issues.apache.org/jira/browse/LUCENE-330
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 10/21/05, Rick Hillegas [EMAIL PROTECTED] wrote:
I have another newbie question based on a quick glance at some classes
in* org.apache.lucene.search.Query*
I'm not sure what type of score you are trying to do, but maybe
FunctionQuery would help.
http://issues.apache.org/jira/browse/LUCENE-446
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 10/22/05, Jeff Rodenburg [EMAIL PROTECTED] wrote:
I have a custom sort that completes
/slink?231706
On 10/22/05, Jeff Rodenburg [EMAIL PROTECTED] wrote:
This is really interesting, I haven't revved our code to this version yet.
Does the score returned by FunctionQuery supersede underlying relevance
scoring or is it rolled in at some base class?
-- j
On 10/22/05, Yonik Seeley
With respect to different terms in a boolean query, they will contribute to
the total score proportional to idf^2, so I think the javadoc as it exists
now is probably more correct.
A single TermQuery will have a final score with a single idf factor in it,
but that's because of the queryweight
To be more literal, I actually meant explain(query,hits.id(i))
On 10/26/05, Yonik Seeley [EMAIL PROTECTED] wrote:
Typo... try explain(query,doc) instead of (query,i)
:-)
Hi Bill,
I can't seem to correctly parse it either...
Format = FF FF FF FF
Version = 00 00 00 00 00 00 00 28
SegCount = 00 00 00 4E
= 00 00 00 04
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 10/26/05, Bill Tschumy [EMAIL PROTECTED] wrote:
I have been trying to reconstitute
There is a currently undocumented extra int32.
Here's the code for writing the segment file:
output.writeInt(FORMAT); // write FORMAT
output.writeLong(++version); // every write changes the index
output.writeInt(counter); // write counter
output.writeInt(size()); // write infos
for (int i = 0; i
Lucene 1.2 is before my time, but check if the functions are
implemented the same as the current version (they probably are).
Scores are not naturally = 1, but for most search methods (including
all that return Hits) they are normalized to be between 1 and 0 if the
highest score is greater than
On 11/5/05, Sameer Shisodia [EMAIL PROTECTED] wrote:
if so the top score should always be 1.0. Isn't so.
Or does boosting multiple individual fields wreck that ?
sameer
The top score is scaled back to 1.0 *only* if it's greater than 1.0
So hits with scores of 4.0,2.0 will be normalized to
The limited number of terms in a range query should hopefully be
addressed before Lucene 1.9 comes out.
I'd give you a reference to the bug, but JIRA seems like it's
currently down. search for ConstantScoreRangeQuery if interested.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
There really isn't a generic way... you have to search for the document.
If you have a unique id field in your document, you can find the
document id quickly via IndexReader.termDocs(term)
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/9/05, [EMAIL PROTECTED] [EMAIL PROTECTED]
The FieldCache (which is used for sorting), uses arrays of size
maxDoc() to cache field values. String sorting will involve caching a
String[] (or StringIndex) and int sorting will involve caching an
int[]. Unique string values are shared in the array, but the String
values plus the String[]
Here is a snippet of the current StringIndex class:
public static class StringIndex {
/** All the term values, in natural order. */
public final String[] lookup;
/** For each document, an index into the lookup array. */
public final int[] order;
}
The order field is used for
The IndexSearcher(MultiReader) will be faster (it's what's used for
indicies with multiple segments too).
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/11/05, Mike Streeton [EMAIL PROTECTED] wrote:
I have several indexes I want to search together. What performs better a
single
Look at IndexReader.open()
It actually uses a MultiReader if there are multiple segments.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/11/05, Charles Lloyd [EMAIL PROTECTED] wrote:
You should run your own tests, but I found the MultiReader to be slower
than a regular
Do you have any deletions in the non-optimized version of the index?
If so, a bug was fixed recently that made for some very slow queries:
http://issues.apache.org/jira/browse/LUCENE-454
You could also try a smaller mergeFactor, which would slow indexing,
but decrease the number of segments, and
Right. getBoost() is meaningless on retrieved documents (it isn't set
when a doc is read from the index).
There really should have been a separate class for documents retrieved
from an index vs documents added... but that's water way under the
bridge.
-Yonik
On 11/17/05, Erik Hatcher [EMAIL
Does it make sense to add an IndexWriter setting to
specify a default position increment gap to use when multiple fields
are added in this way?
Per-field might be nice...
The good news is that Analyzer is an abstract class, and not an
Interface, so we could add something to it without
It depends on
Document.fields() of a stored and retrieved document: does it return
all the appended field parts as separate Fields, or does it only
return one Field with all parts appended?
Separate fields. Stored fields are returned back to you verbatim.
-Yonik
Now hiring --
I haven't done measurements, but the first query with a sort on a
particular field will involve filling the field-cache and that can
take a while (especially for numeric fields).
If you haven't already, you should compare the query times of a
warmed searcher. Sorted queries will still take
On 11/20/05, Jeff Rodenburg [EMAIL PROTECTED] wrote:
Why are numeric fields more onerous in filling the field-cache?
Float.parseFloat() or Integer.parseInt() for each unique term.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
Karl,
You are opening IndexSearchers in this code but not closing them.
If GC finalizers don't happen to run before you run out of file
handles, you will get exceptions.
You could close the IndexSearcher after every request, but it would
lead to very poor performance. Better to keep a single
This is expected behavior: you are probably quickly becoming CPU bound
(which isn't a bad thing). More threads only help when some threads
are waiting on IO, or if you actually have a lot of CPUs in the box.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/21/05, Oren Shir [EMAIL
On 11/21/05, Oren Shir [EMAIL PROTECTED] wrote:
It is rather sad if 10 threads reach the CPU limit. I'll check it and get
back to you.
It's about performance and throughput though, not about number of
threads it takes to reach saturation.
In a 2 CPU box, I would say that the ideal situation is
On 11/21/05, Erik Hatcher [EMAIL PROTECTED] wrote:
Modifying Analyzer as you have suggested would
require DocumentWriter additionally keep track of the field names
and note when one is used again.
For position increments, it doesn't have to be tracked. The patch to
DocumentWriter could also
On 11/21/05, Erik Hatcher [EMAIL PROTECTED] wrote:
Neither. It'll throw an exception.
Just don't rely on it to throw an exception either though... the
checking is not comprehensive.
One should treat sorting on a field with more than one value per
document as undefined.
-Yonik
Now hiring --
And of course Doug still does a lot of work on Lucene, but often
leaves the commit to someone else.
On 11/22/05, Daniel Naber [EMAIL PROTECTED] wrote:
On Dienstag 22 November 2005 19:33, aurora wrote:
(http://www.javarants.com/B1823453972/C1460559707/E20051119163857/index.
html). Lucene is
G, I think it's that AUTO sorting again...
Check out this bug:
http://issues.apache.org/jira/browse/LUCENE-463
If you specify a string sort explicitly, it should work.
If you are using a multisearcher, please upgrade to the latest lucene
version (there have been some sorting bug fixes).
I finally got around to updating FunctionQuery:
http://issues.apache.org/jira/browse/LUCENE-446
Comments suggestions welcome.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
I checked out readVInt() to see if I could optimize it any...
For a random distribution of integers 200 I was able to speed it up a
little bit, but nothing to write home about:
old newpercent
Java14-client : 13547 12468 8%
Java14-server: 6047 5266 14%
The only problems I've had with 1.5 JVM crashes and Lucene was related
to stack overflow... try increasing the stack size and see of anything
different happens.
My crashes happened while trying to use Luke to open a 4GB index with
thousands of indexed fields.
-Yonik
Sounds like it's a hotspot bug.
AFAIK, hotspot doesn't just compile a method once... it can do
optimization over time.
To work around it, have you tried pre previous version: 1.5_05?
It's possible it's a fairly new bug. We've been running with that
version and Lucene 1.4.3 without problems (on
You also might try -Xbatch or -Xcomp to see if that fixes it (or
reproduces it faster).
Here's a great list of JVM options:
http://blogs.sun.com/roller/resources/watt/jvm-options-list.html
-Yonik
On 12/11/05, Yonik Seeley [EMAIL PROTECTED] wrote:
Sounds like it's a hotspot bug.
AFAIK, hotspot
On 12/14/05, Chuck Williams [EMAIL PROTECTED] wrote:
If there is some specific reason it is not deemed suitable
to commit, please let me know. It is much harder to use
DisjunctionMaxQuery without this parser.
Hey Chuck,
I committed DisjunctionMaxQuery after I took the time to understand
it,
Are you using the same Analyzer for both indexing and querying (or the
same StopFilter at least)?
-Yonik
On 12/15/05, javier muguruza [EMAIL PROTECTED] wrote:
Hi,
Suppose I have a query like this:
+attachments:purpose
that returns N hits.
If I add another condition
+attachments:purpose
I can't reproduce this behavior with the current version of Lucene.
+text:solar = 112 docs
+text:a a a = 0 docs because a is a stop word
+textsolar +text:a a a = 112 docs
-Yonik
On 12/15/05, javier muguruza [EMAIL PROTECTED] wrote:
Hi,
Suppose I have a query like this:
W.r.t. ConstantScoringQuery, it contains a minor bug: it doesn't the
handle the case where the Filter.bits method would return null.
Can Filter.bits() ever return null though? AFAIK, that's not in the contract.
The Filter.getBits() javadoc says:
Returns a BitSet with true for
On 12/20/05, John Powers [EMAIL PROTECTED] wrote:
I would like to be able to search for 19 inches with the quote. So I get a
query like this:
Line 1: +( (name:19*^4 ld:19*^2 sd:19*^3 kw:19*^1) )
That won't work, so I wanted to escape the quotes.The docs said to use a
backslash. So
Here's more on query-parser escaping gotchas:
http://www.mail-archive.com/java-user@lucene.apache.org/msg02354.html
-Yonik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On 12/20/05, John Powers [EMAIL PROTECTED] wrote:
Ok, I understand the .toString() part.
But, if I have some 19 in the text of these items, and I do a search with
19, that has been escaped before parsingwhy am I not getting anything?
The indexer analyzer took them out? So then to find
That shouldn't happen.
What platform(s) have you seen this on, and with what Lucene versions?
-Yonik
On 12/27/05, Chris Lu [EMAIL PROTECTED] wrote:
This is generally true, most of the time.
But my experience is, there can be some FileNotFoundException, if your
searcher is opened for a while,
That's a Lucene 1.4 limitation, gone in the latest 1.9 development version.
If you want to stick with 1.4, try restructuring your query to avoid
this restriction.
-Yonik
On 12/27/05, Alex Kiselevski [EMAIL PROTECTED] wrote:
I got a strange exception More than 32 required/prohibited clauses in
/lucene/java/trunk lucene
cd lucene
ant
-Yonik
On 12/27/05, Alex Kiselevski [EMAIL PROTECTED] wrote:
I didn't find a mention about 1.9 version in Lucene site
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 27, 2005 4:52 PM
To: java-user
on Any Database
On 12/27/05, Yonik Seeley [EMAIL PROTECTED] wrote:
That shouldn't happen.
What platform(s) have you seen this on, and with what Lucene versions?
-Yonik
On 12/27/05, Chris Lu [EMAIL PROTECTED] wrote:
This is generally true, most of the time.
But my experience
Off the top of my head:
1) also index the field untokenized and use a straight prefix query
2) index a magic token at the start of the title and include that in a
phrase query:
_START_ the quick
3) use a SpanFirst query (but you have to make the Java Query object yourself)
-Yonik
On 1/5/06,
Check out PhrasePrefixQuery.
-Yonik
On 1/5/06, Paul Smith [EMAIL PROTECTED] wrote:
first off response to my own post, I meant PhraseQuery instead.
But, since we're only tokenizing this field ,and not storing the
entire contents of the field, I'm not sure this is ever going to
work, is it?
That's deprecated now of course... so you want MultiPhraseQuery.
-Yonik
On 1/5/06, Yonik Seeley [EMAIL PROTECTED] wrote:
Check out PhrasePrefixQuery.
-Yonik
On 1/5/06, Paul Smith [EMAIL PROTECTED] wrote:
first off response to my own post, I meant PhraseQuery instead.
But, since we're
Should we should detect the case of all negative clauses and throw in
a MatchAllDocsQuery?
I guess this would be done in the QueryParser, but one could also make
a case for doing it in the BooleanQuery.
-Yonik
On 1/6/06, Erik Hatcher [EMAIL PROTECTED] wrote:
With Lucene's trunk, there is a
The actual fields of found documents are not prefetched, only the ids.
And imagine, that user is on fourth
page - reading first 100 document is waste of time.
As it relates to document ids, you must know what the first 100 are if
you are to know which ones follow.
If you want more control
+1 from me.
-Yonik
On 1/7/06, Erik Hatcher [EMAIL PROTECTED] wrote:
+1 to Hoss's suggested enhancement to QueryParser.
I'll volunteer to implement this barring any objections in the next
day or so.
Erik
-
To
Are you using the latest version of Lucene (after Dec 8th)? There was
a bug fix regarding this:
http://issues.apache.org/jira/browse/LUCENE-479
-Yonik
On 1/8/06, Koji Sekiguchi [EMAIL PROTECTED] wrote:
Hello Luceners!
steps:
1. index has 15 docs and has no deleted docs
2. call
Closing the reader that did the deletion causes the deletions to be
flushed to the index.
After that point, any new readers you open will see the deletions.
Any old index readers that were opened before the deleting reader was
closed will still see the old version of the index (without the
Lock files aren't contained in the index directory, but in the
standard temp directory.
remove the file referenced in the exception:
C:\DOCUME~1\harini\LOCALS~1\Temp\lucene-1b92bc48efc5c13ac4ef4ad9fd17c158-commit.lock
-Yonik
On 1/9/06, Harini Raghavan [EMAIL PROTECTED] wrote:
Hi All,
All of a
Click on Source Repository off of the main Lucene page.
Here is a pointer to the search package containing TermQuery/Weight/Scorer
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/?sortby=file#dirlist
Look in TermQuert for TermWeight (it's an inner class).
A phrase query with slop scores matching documents higher when the
terms are closer together.
a b c~1
-Yonik
On 1/10/06, Eric Jain [EMAIL PROTECTED] wrote:
Is there an efficient way to determine if two or more terms frequently
appear next to each other sequence? For a query like:
a b c
On 1/12/06, Kan Deng [EMAIL PROTECTED] wrote:
Many thanks, Doug.
A quick question, which class implements the following
logic?
It looks to me like org.apache.lucene.index.TermInfosReader
-Yonik
-
To unsubscribe, e-mail:
Check out minNrShouldMatch in BooleanQuery in the latest lucene
version (1.9 dev version in subversion).
-Yonik
On 1/19/06, Anton Potehin [EMAIL PROTECTED] wrote:
Suppose that the search query contains 20 terms. It is necessary to find
all documents which contains at least 5 terms from search
PROTECTED] wrote:
Are you certain? I am quite sure we retrieve a huge amount of data if there
are thousands of matches to one query.
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Thu 2006-01-19 16:45
To: java-user@lucene.apache.org
Subject: Re: Limiting hits
If you didn't want to store term vectors you could also run the
document fields through the analyzer yourself and collect the Tokens
(you should still have the fields you just indexed... no need to
retrieve it again).
-Yonik
On 1/20/06, Klaus [EMAIL PROTECTED] wrote:
In my case, i need to
It's not in subversion yet though ;-)
You have to look here:
http://issues.apache.org/jira/browse/LUCENE-446
I haven't committed it, because we may be able to do better (maybe
removing the difference between Query and ValueSource so you could
freely mix the two and not have to wrap ValueSource
Thanks Peter, that's useful info.
Just out of curiosity, what kind of box is this? what CPUs?
-Yonik
On 1/25/06, Peter Keegan [EMAIL PROTECTED] wrote:
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus),
the maximum throughput occurred with just 4 query threads. The
On 1/25/06, Peter Keegan [EMAIL PROTECTED] wrote:
It's a 3GHz Intel box with Xeon processors, 64GB ram :)
Nice!
Xeon processors are normally hyperthreaded. On a linux box, if you
cat /proc/cpuinfo, you will see 8 processors for a 4 physical CPU
system. Are you positive you have 8 physical
Hmmm, can you run the 64 bit version of Windows (and hence a 64 bit JVM?)
We're running with heap sizes up to 8GB (RH Linux 64 bit, Opterons,
Sun Java 1.5)
-Yonik
On 1/26/06, Peter Keegan [EMAIL PROTECTED] wrote:
Paul,
I tried this but it ran out of memory trying to read the 500Mb .fdt file.
threads, which
is pretty impressive. Another way around the concurrency limit is to run
multiple jvms. The throughput of each is less, but the aggregate throughput
is higher.
Peter
On 1/26/06, Yonik Seeley [EMAIL PROTECTED] wrote:
Hmmm, can you run the 64 bit version of Windows (and hence
1 - 100 of 577 matches
Mail list logo