Thanks Mikhail.
On 2/13/20 5:05 AM, Mikhail Khludnev wrote:
Hello,
I picked two first questions for reply.
does this class offer any Shingling capability embedded to it?
No, it doesn't allow to expand wildcard phrase with shingles.
I could not find any api within this class
Hello,
I picked two first questions for reply.
> does this class offer any Shingling capability embedded to it?
>
No, it doesn't allow to expand wildcard phrase with shingles.
> I could not find any api within this class ComplexPhraseQueryParser for
> that purpose.
>
There are no one.
>
org.apache.lucene.search.PhraseWildcardQuery
looks very good, i hope this makes into Lucene
build soon.
Thanks
> On Feb 12, 2020, at 10:01 PM, baris.ka...@oracle.com wrote:
>
> Thanks David, can i look at the source code?
> i think ComplexPhraseQueryParser uses
> something similar.
> i will
Thanks David, can i look at the source code?
i think ComplexPhraseQueryParser uses
something similar.
i will check the differences but do You know the differences for quick
reference?
Thanks
> On Feb 12, 2020, at 6:41 PM, David Smiley wrote:
>
>
> Hi,
>
> See
Hi,
See org.apache.lucene.search.PhraseWildcardQuery in Lucene's sandbox
module. It was recently added by my amazing colleague Bruno. At this time
there is no query parser that uses it in Lucene unfortunately but you can
rectify this for your own purposes. I hope this query "graduates" to
Hi,-
Regarding this mechanisms below i mentioned,
does this class offer any Shingling capability embedded to it?
I could not find any api within this class ComplexPhraseQueryParser for
that purpose.
For instance does this class offer the most commonly used words api?
i can then use one of
Thanks but i thought this class would have a mechanism to fix this issue.
Thanks
> On Feb 4, 2020, at 4:14 AM, Mikhail Khludnev wrote:
>
> It's slow per se, since it loads terms positions. Usual advices are
> shingling or edge ngrams. Note, if this is not a text but a string or enum,
> it
It's slow per se, since it loads terms positions. Usual advices are
shingling or edge ngrams. Note, if this is not a text but a string or enum,
it probably let to apply another tricks. Another idea is perhaps
IntervalQueries can be smarter and faster in certain cases, although they
are backed on
How can this slowdown be resolved?
is this another limitation of this class?
Thanks
> On Feb 3, 2020, at 4:14 PM, baris.ka...@oracle.com wrote:
>
> Please ignore the first comparison there. i was comparing there {term1 with
> 2 chars} vs {term1 with >= 5 chars + term2 with 1 char}
>
>
> The
Please ignore the first comparison there. i was comparing there {term1
with 2 chars} vs {term1 with >= 5 chars + term2 with 1 char}
The slowdown is
The query "term1 term2*" slows down 400 times (~1500 millisecs) compared
to "term1*" when term1 has >5 chars and term2 is still 1 char.
Best
Hi,-
i hope everyone is doing great.
I saw this issue with this class such that if you search for "term1*"
it is good, (i.e., 4 millisecs when it has >= 5 chars and it is ~250
millisecs when it is 2 chars)
but when you search for "term1 term2*" where when term2 is a single
char, the
...@yahoo.com
Cc:
Sent: Sunday, October 23, 2011 7:18 PM
Subject: Re: performance question - number of documents
Why would it matter...top 5 matches Because Lucene has to calculate
the score of all documents in order to insure that it returns those 5
documents.
What if the very last document
-
From: Erick Erickson erickerick...@gmail.com
To: java-user@lucene.apache.org; sol myr solmy...@yahoo.com
Cc:
Sent: Sunday, October 23, 2011 7:18 PM
Subject: Re: performance question - number of documents
Why would it matter...top 5 matches Because Lucene has to calculate
the score of all
Why would it matter...top 5 matches Because Lucene has to calculate
the score of all documents in order to insure that it returns those 5 documents.
What if the very last document scored was the most relevant?
Best
Erick
On Sun, Oct 23, 2011 at 3:06 PM, sol myr solmy...@yahoo.com wrote:
Hi,
Searching billions of anything is likely to be challenging. Mark
Miller's document at
http://www.lucidimagination.com/content/scaling-lucene-and-solr looks
well worth a read.
-if i search on last week's index and the individual index (this needs to be
opened at search request!?) will it be
Thank you for the reply, if you need more info to understand the question,
I'll try to be as prompt as possible.
-if i search on last week's index and the individual index (this needs to
be
opened at search request!?) will it be faster than using a single huge
index
for all groups, for all
Hello,
My name is Mihai and I'm trying to write a java (later I'll need to port it
to pylucene) search on billions of mentions like twitter statuses. Mentions
are grouped by some containing keywords.
I'm thinking of partitioning the index for faster results as follows:
Hi Greg,
Thanks for quick and detailed answer.
What kind of queries do you run? Is it going to work for
SpanNearQueries/SpanNotQueries as well?
Do you also get the word itself at each position?
It would be great if I could search on the content of each payload as well,
but since the payload
The queries I'm doing really aren't anything clever...just searching for
phrases on pages of text, sometimes narrowing results by other words that
must appear on the page, or words that cannot appear on the same page. I
don't have experience with those span queries so i can't say much about
them.
Hi,
Can you please shed some light on how your final architecture looks like?
Do you manually use the PayloadSpanUtil for each document separately?
How did you solve the problem with phrase results?
Thanks in advance for your time,
Eran.
On Tue, Nov 25, 2008 at 10:30 PM, Greg Shackles [EMAIL
Sure, I'm happy to give some insight into this. My index itself has a few
fields - one that uniquely identifies the page, one that stores all the text
on the page, and then some others to store characteristics. At indexing
time, the text field for each document is manually created by
Just wanted to post a little follow-up here now that I've gotten through
implementing the system using payloads. Execution times are phenomenal!
Things that took over a minute to run in my old system take fractions of a
second to run now. I would also like to thank Mark for being very
responsive
On Wed, Nov 19, 2008 at 12:33 PM, Greg Shackles [EMAIL PROTECTED] wrote:
In the searching phase, I would run the search across all page documents,
and then for each of those pages, do a search with
PayloadSpanUtil.getPayloadsForQuery that made it so it only got payloads for
each page at a
Yeah, discussion came up on order and I believe we punted - its up to
you to track order and sort at the moment. I think that was to prevent
those that didnt need it from paying the sort cost, but I have to go
find that discussion again (maybe its in the issue?) I'll look at the
whole idea
Thanks for the update, Mark. I guess that means I'll have to do the sorting
myself - that shouldn't be too hard, but the annoying part would just be
knowing where one result ends and the next begins since there's no guarantee
that they'll always be the same. Let me know if you find any
I have a couple quick questions...it might just be because I haven't looked
at this in a week now (got pulled away onto some other stuff that had to
take priority).
In the searching phase, I would run the search across all page documents,
and then for each of those pages, do a search with
Hi,
I have the same need - to obtain attributes for terms stored in some
field. I also need all the results and can't take just the first few docs.
I'm using an older version of lucene and the method i'm using right now is
this:
1. Store the words as usual in some field.
2. Store the attributesof
I hope this isn't a dumb question or anything, I'm fairly new to Lucene so
I've been picking it up as I go pretty much. Without going into too much
detail, I need to store pages of text, and for each word on each page, store
detailed information about it. To do this, I have 2 indexes:
1) pages:
If I may suggest, could you expand upon what you're trying to
accomplish? Why do you care about the detailed information
about each word? The reason I'm suggesting this is the XY
problem. That is, people often ask for details about a specific
approach when what they really need is a different
Hi Erick,
Thanks for the response, sorry that I was somewhat vague in the reasoning
for my implementation in the first post. I should have mentioned that the
word details are not details of the Lucene document, but are attributes
about the word that I am storing. Some examples are position on
If your new to Lucene, this might be a little much (and maybe I am not
fully understand the problem), but you might try:
Add the attributes to the words in a payload with a PayloadAnalyzer. Do
searching as normal. Use the new PayloadSpanUtil class to get the
payloads for the matching words.
Hey Mark,
This sounds very interesting. Is there any documentation or examples I
could see? I did a quick search but didn't really find much. It might just
be that I don't know how payloads work in Lucene, but I'm not sure how I
would see this actually doing what I need. My reasoning is
Here is a great power point on payloads from Michael Busch:
www.us.apachecon.com/us2007/downloads/AdvancedIndexing*Lucene*.ppt.
Essentially, you can store metadata at each term position, so its an
excellent place to store attributes of the term - they are very fast to
load, efficient, etc.
Greg Shackles wrote:
Thanks! This all actually sounds promising, I just want to make sure I'm
thinking about this correctly. Does this make sense?
Indexing process:
1) Get list of all words for a page and their attributes, stored in some
sort of data structure
2) Concatenate the text from
Right, sounds like you have it spot on. That second * from 3 looks like a
possible tricky part.
I agree that it will be the tricky part but I think as long as I'm careful
with counting as I iterate through it should be ok (I probably just doomed
myself by saying that...)
Right...you'd do it
Thanks! This all actually sounds promising, I just want to make sure I'm
thinking about this correctly. Does this make sense?
Indexing process:
1) Get list of all words for a page and their attributes, stored in some
sort of data structure
2) Concatenate the text from those words (space
itself to happen in the order of a
few milliseconds irrespective of the number of documents it matched. Am I
expecting too much ?
--
View this message in context:
http://www.nabble.com/Search-performance-question-tf4391551.html#a12520740
Sent from the Lucene - Java Users mailing list archive
Your not expecting too much. On cheap hardware I watch searches on over
5 mil + docs that match every doc come back in under a second. Able to
post your search code?
makkhar wrote:
Hi,
I have an index which contains more than 20K documents. Each document has
the following structure :
of the number of documents it
matched. Am I
expecting too much ?
--
View this message in context: http://www.nabble.com/Search-
performance-question-tf4391551.html#a12520740
Sent from the Lucene - Java Users mailing list archive at Nabble.com
On 6-Sep-07, at 4:41 AM, makkhar wrote:
Hi,
I have an index which contains more than 20K documents. Each
document has
the following structure :
field : ID (Index and store) typical value
- 1000
field : parameterName(index and store) typical value -
Hi All,
I have a sort performance question:
I have a fairly large index consisting of chunks of full-text
transcriptions of television, radio and other media, and I'm trying to
make it searchable and sortable by date. The search front-end uses a
parallelmultisearcher to search up to three
Are you using a cached IndexSearcher such that successive sorts on
the same field will be more efficient?
Erik
On Mar 20, 2007, at 3:39 PM, David Seltzer wrote:
Hi All,
I have a sort performance question:
I have a fairly large index consisting of chunks of full-text
Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 20, 2007 4:03 PM
To: java-user@lucene.apache.org
Subject: Re: Sort Performance Question
Are you using a cached IndexSearcher such that successive sorts on
the same field will be more efficient?
Erik
On Mar 20, 2007, at 3:39 PM, David
the very first search against the index.
How would a cached searcher implementation look?
-Dave
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 20, 2007 4:03 PM
To: java-user@lucene.apache.org
Subject: Re: Sort Performance Question
Are you using a cached
have a sort performance question:
I have a fairly large index consisting of chunks of full-text
transcriptions of television, radio and other media, and I'm trying to
make it searchable and sortable by date. ...
Initially I was sorting based on a unixtime field, but having read
up on
it, I
storing design and performance question
Renaud, one optimization you can do on this is to try the first 10kb, see if
it finds text worth highlighting, if not, with a slight overlap try the next
9.9kb - 19.9kb or just 9.9kb - end if you're feeling lazy.
This assumes that most good matches
-user@lucene.apache.org
Subject: Re: Text storing design and performance question
Renaud, one optimization you can do on this is to try the first 10kb, see if
it finds text worth highlighting, if not, with a slight overlap try the next
9.9kb - 19.9kb or just 9.9kb - end if you're feeling lazy
In general, if you are having performance issues with highlighting, the
first thing to do is double check what the bottleneck is: is it accessing
the text to by highlighted, or is it running the highlighter?
you suggested earlier in the thread that the problem was with accessing
the text...
:
.
Because I have duplicated data, one in the index and the other in the db,
are there other ways of handling this situation in a more efficient and
performant way? Thanks in advance.
-los
--
View this message in context:
http://www.nabble.com/Text-storing-design-and-performance-question-tf2953201
and the other in
the db,
are there other ways of handling this situation in a more efficient
and
performant way? Thanks in advance.
-los
--
View this message in context: http://www.nabble.com/Text-storing-
design-and-performance-question-tf2953201.html#a8259883
Sent from the Lucene - Java Users
? Thanks in advance.
-los
--
View this message in context: http://www.nabble.com/Text-storing-
design-and-performance-question-tf2953201.html#a8259883
Sent from the Lucene - Java Users mailing list archive at Nabble.com
-storing-
design-and-performance-question-tf2953201.html#a8259883
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED
duplicated data, one in the index and the other in
the db,
are there other ways of handling this situation in a more efficient
and
performant way? Thanks in advance.
-los
--
View this message in context: http://www.nabble.com/Text-storing-
design-and-performance-question-tf2953201
this message in context: http://www.nabble.com/Text-storing-
design-and-performance-question-tf2953201.html#a8259883
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL
storing term vectors would keep the index lean and
allow for fast highlighting?
--Renaud
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 10, 2007 9:54 AM
To: java-user@lucene.apache.org
Subject: Re: Text storing design and performance question
in advance.
-los
--
View this message in context: http://www.nabble.com/Text-storing-
design-and-performance-question-tf2953201.html#a8259883
Sent from the Lucene - Java Users mailing list archive at
Nabble.com
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 10, 2007 9:54 AM
To: java-user@lucene.apache.org
Subject: Re: Text storing design and performance question
Being stateless should not be much of an issue. As Erick mentioned, the
highlighter just
To: java-user@lucene.apache.org
Subject: RE: Text storing design and performance question
Maybe keeping the data in the DB would make it quicker? Seems like the I/O
performance would cause most of the performance issues you're seeing.
-los
Renaud Waldura-5 wrote:
We used to store a big text
for fast highlighting?
--Renaud
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 10, 2007 9:54 AM
To: java-user@lucene.apache.org
Subject: Re: Text storing design and performance question
Being stateless should not be much of an issue. As Erick
: Performance question
Does it matter what order I add the sub-queries to the BooleanQuery Q.
That is, is the execution speed for the search faster (slower) if I
do:
Q.add(Q1, BooleanClause.Occur.MUST);
Q.add(Q2, BooleanClause.Occur.MUST);
Q.add(Q3
I was reading a book on SQL query tuning. The gist of it was that the
way to get the best performance (fastest execution) out of a SQL select
statement was to create execution plans where the most selective term
in the where clause is used first, the next most selective term is
used next, etc.
Does it matter what order I add the sub-queries to the BooleanQuery Q.
That is, is the execution speed for the search faster (slower) if I do:
Q.add(Q1, BooleanClause.Occur.MUST);
Q.add(Q2, BooleanClause.Occur.MUST);
Q.add(Q3, BooleanClause.Occur.MUST);
As
Hi,
I am using a multi threaded app to index a bunch of Data. The app spawns
X number of threads. Each thread writes to a RAMDirectory. When thread
finishes it work, the contents from the RAMDirectory are written into
the FSDirectory. All threads are passed an instance of the FSWriter when
Hi,
My lucene index is not big (about 150M). My computer has 2G RAM but for some
reason when I'm trying to store my index
using org.apache.lucene.store.RAMDirectory it fails with java out of memory
exception. Also sometimes for the same
search query time spent on search could raise in 10-20
What is your Java max heap size set to? This is the -Xmx Java option.
Daniel Feinstein wrote:
Hi,
My lucene index is not big (about 150M). My computer has 2G RAM but for some reason when I'm trying to store my index
using org.apache.lucene.store.RAMDirectory it fails with java out of memory
index?
Thanks
Mike
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: 12 November 2005 01:39
To: java-user@lucene.apache.org
Subject: Re: Performance Question
Look at IndexReader.open()
It actually uses a MultiReader if there are multiple segments.
-Yonik
Now hiring
I have several indexes I want to search together. What performs better a
single searcher on a multi reader or a single multi searcher on multiple
searchers (1 per index).
Thanks
Mike
The IndexSearcher(MultiReader) will be faster (it's what's used for
indicies with multiple segments too).
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/11/05, Mike Streeton [EMAIL PROTECTED] wrote:
I have several indexes I want to search together. What performs better a
single
You should run your own tests, but I found the MultiReader to be slower
than a regular IndexReader. I was running on a dual-cpu box and two
separate disk drives.
Charles.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
Look at IndexReader.open()
It actually uses a MultiReader if there are multiple segments.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/11/05, Charles Lloyd [EMAIL PROTECTED] wrote:
You should run your own tests, but I found the MultiReader to be slower
than a regular
I have 5 indexes, each one is 6GB...I need 512MB of Heap size in order to open
the index and have all type of queries. My question is, is it better to just
have on large Index 30GB? will increasing the Heap size increase performance?
can I store an instance of MultiSearcher(OR just Searcher in
71 matches
Mail list logo