I've noticed that after stress-testing my application (uses Lucene 2.0) for
I while, I have almost 200mb of byte[]s hanging around, the top two
culprits being:
24 x SegmentReader.Norm.bytes = 112mb
2 x SegmentReader.ones = 16mb
The second one isn't a big deal, but I wonder what's the
Yonik Seeley wrote:
On 12/11/06, Eric Jain [EMAIL PROTECTED] wrote:
I've noticed that after stress-testing my application (uses Lucene
2.0) for
I while, I have almost 200mb of byte[]s hanging around, the top two
culprits being:
24 x SegmentReader.Norm.bytes = 112mb
2 x SegmentReader.ones
Yonik Seeley wrote:
It's read on demand, per indexed field.
So assuming your index is optimized (a single segment), then it
increases by one byte[] each time you search on a new field.
OK, makes sense then. Thanks!
-
To
Chris Nokleberg wrote:
I am using the QueryParser with a StandardAnalyzer. I would like to avoid
or auto-correct anything that would lead to a ParseException. For example,
I don't think you can get a parse exception from Google--even if you omit
a closing quote it looks like it just closes it
Chris Hostetter wrote:
THe only usefull callback/listner abstractions i can think of are when you
want to know if someone has finished with a set of changes -- wether that
change is adding one document, deleting one document, or adding/deleting a
whole bunch of documents isn't really relevent,
Erik Hatcher wrote:
On Apr 28, 2006, at 5:35 AM, Eric Jain wrote:
What is the best way to prevent a phrase query such as eggs white
matching fried eggs\nwhite snow?
Two possibilities I have thought about:
1. Replace all line breaks with a special string, e.g. newline.
2. Have an analyzer
What is the best way to prevent a phrase query such as eggs white
matching fried eggs\nwhite snow?
Two possibilities I have thought about:
1. Replace all line breaks with a special string, e.g. newline.
2. Have an analyzer somehow increment the position of a term for each line
break it
thomasg wrote:
1) By default, Lucene only indexes the first 10,000 words from each
document. When increasing this default out-of-memory errors can occur. This
implies that documents, or large sections thereof, are loaded into memory.
ISYS has a very small memory footprint which is not affected
Florian Hanke wrote:
I'd like to append an * (create a WildcardQuery) to each search term in
a query, such that a query that is entered as e.g. term1 AND term2 is
modified (effectively) to term1* AND term2*.
Parsing the search string is not very elegant (of course). I'm thinking
that
[EMAIL PROTECTED] wrote:
When I make search I get count = 37.
May be I do something not correctly?
I assume you are ran both variants repeatedly, in the same process (start
up costs etc)?
-
To unsubscribe, e-mail:
Anton Potehin wrote:
Now I create new search for get number of results. For example:
IndexSearcher is = ...
Query q = ...
numberOfResults = Is.search(q).length();
Can I accelerate this example ? And how ?
Perhaps something like:
class CountingHitCollector
implements HitCollector
{
Anton Potehin wrote:
After it I want to not make a new search,
I want to make search among found results...
Perhaps something like this would work:
final BitSet results = toBitSet(Hits);
searcher.search(newQuery, new Filter() {
public BitSet bits(IndexReader reader) {
return results;
Daniel Naber wrote:
Please try to add this to MultiPhraseQuery and let us know if it helps:
public List getTerms() {
return termArrays;
}
That is indeed all I need (the list wouldn't have to be mutable though).
Any chance this could be committed?
Incidentally, would be helpful if
I need to write a function that copies a MultiPhraseQuery and changes the
field the query applies to. Unfortunately the API allows access to neither
the contained terms nor the field! The other query classes I have so far
dealt with all seem to allow access to the contained query terms...
I've noticed that while the QueryParser (both the default QueryParser and
the PrecedenceQueryParser) refuse to parse
foo bar) baz
they both seem to interpret
foo bar( baz
as
foo bar
Bug or feature?
In any case, would be great if there was a strict mode, and a more
lenient mode
Eric Jain wrote:
I'll rerun the indexing
procedure with the old version overnight, just to be sure.
Just to confirm: There no longer seems to be any difference in indexing
performance between the nightly build and 1.4.3
Yonik Seeley wrote:
Solr is a new open-source search server that's based on Lucene, and
has XML/HTTP interfaces for updating and querying, declarative
specification of analyzers and field types via a schema, extensive
caching, replication, and a web admin interface.
Just had a look, quite
Daniel Naber wrote:
A fix has now been committed to trunk in SVN, it should be part of the next
1.9 release.
Performance seems to have recovered, more or less, thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
Otis Gospodnetic wrote:
Regarding performance fix - if you can be more precise (is it really
just more or less or is it as good as before), that would be great
for those of us itching to use 1.9.
Yes, I can confirm that performance differs by no more than 3.1 fraggles.
;-)
Doug Cutting wrote:
If you use a span query then you can get the actual number of phrase
instances.
Thanks, good to know!
In this case (need to suggest phrase queries to the user) I've now settled
with dividing the number of hits for a potential phrase by the number of
documents that
Dave Kor wrote:
Not sure if this is what you want, but what I have done is to issue
exact phrase queries to Lucene and counted the number of hits found.
This gives you the number of documents containing the phrase, rather than
the number of occurrences of the phrase itself, but that may in
This is somewhat related to a question sent to this list a while ago: Is
there an efficient way to count the number of occurrences of a phrase (not
term) in an index?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
I need to parse a query string, modify it a bit, and then output the
modified query string. This works quite well with query.toString(), except
that when I parse the query I set DEFAULT_OPERATOR_AND, and the output of
BooleanQuery.toString() assumes DEFAULT_OPERATOR_OR... Would be great if
Chris Hostetter wrote:
(Assuming *I* understand it) what he's talking baout, is the ability for
his search GUI to display suggested phrase searches you may want to try
which consist of the words you just typed in grouped into phrases.
Yes, that's precisely what I am talking about. Sorry for
Paul Elschot wrote:
One way that might be better is to provide your own Scorer
that works on the term positions of the three or more terms.
This would be better for performance because it only uses one
term positions object per query term (a, b, and c here).
I'm trying to extract the actual
Paul Elschot wrote:
In case you prefer to use the maximum score over the clauses you
can use the DisjunctionMaxQuery from the development version.
Yes, that may help! I'll need to have a look...
-
To unsubscribe, e-mail:
Is there an efficient way to determine if two or more terms frequently
appear next to each other sequence? For a query like:
a b c
one or more of the following suggestions could be generated:
a b c
a b c
a b c
I could of course just run a search with all possible combinations, but
perhaps
Lucene seems to prefer matches in shorter documents. Is it possible to
influence the scoring mechanism to have matches in shorter fields score
higher instead?
For example, a query for europe should rank:
1. title:Europe
2. title:History of Europe
3. title:Travel in Europe, Middle East and
Paul Elschot wrote:
For example, a query for europe should rank:
1. title:Europe
2. title:History of Europe
3. title:Travel in Europe, Middle East and Africa
4. subtitle:Fairy Tales from Europe
Perhaps with this query (assuming the default implicit OR):
title:europe subtitle:europe^0.5
29 matches
Mail list logo