Hi,
I'm going to replace an old reader/writer synchronization mechanism we had
implemented with the new near realtime search facilities in Lucene 2.9.
However, it's still a bit unclear on how to efficiently do it.
Is the following implementation the good way to do achieve it ? The context
is
Hi ,
I am using StandardAnalyzer for indexing as well as searching the
indexes.But my search doesn't work correctly with special characters.I am
storing some special characters in a field called TransType.ie
document.add(new Field(TransType, db92fb60-b716-11de-8718-001a4bc7d46e,
Uwe Schindler wrote:
I forgot: The format of numeric fields is also not plain text, because of
this a simple TermQuery as generated by your query parser will not work,
too.
If you want to hit numeric values without a NumericRangeQuery with lower and
upper bound equal, you have to use
Hi,
I have a question related to faceted search. My index contains more than 1
million documents, and nearly 1 million terms. My aim is to get a DocIdSet
for each term occurring in the result of a query. I use the approach
described on
On Monday 12 October 2009 14:53:45 Christoph Boosz wrote:
Hi,
I have a question related to faceted search. My index contains more than 1
million documents, and nearly 1 million terms. My aim is to get a DocIdSet
for each term occurring in the result of a query. I use the approach
described
Can you print the upper and lower term or the term you received in
newRangeQuery and newTermQuery also to System.out? Maybe it is converted
somehow by your Analyzer, that is used for parsing the query.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail:
Uwe Schindler wrote:
Can you print the upper and lower term or the term you received in
newRangeQuery and newTermQuery also to System.out? Maybe it is converted
somehow by your Analyzer, that is used for parsing the query.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
Given you have 1M docs and about 1M terms, do you see very few docs per
term?
If your DocSet per term is very sparse, BitSet is probably not a good
representation. Simple int array maybe better for memory, and faster for
iterating.
-John
On Mon, Oct 12, 2009 at 8:45 AM, Paul Elschot
Thanks a lot. I think TermPositionsVector will solve my problem.
Although it seems to be a little inperformant
Concerning the term representation: our data is way more complex then
just phrasal annotation, it was just an example, because I am not
allowed to talk about our internal organisation. I
Hi Cedric,
I don't know of anyone with a substantial throughput production system who
is doing realtime search with the 2.9 improvements yet (and in fact, no
serious performance analysis has been done on these even in the lab so to
speak: follow https://issues.apache.org/jira/browse/LUCENE-1577
On Mon, Oct 12, 2009 at 3:17 PM, Jake Mannix jake.man...@gmail.com wrote:
Wait, so according to the javadocs, the IndexReader which you got from
the IndexWriter forwards calls to reopen() back to IndexWriter.getReader(),
which means that if the user has a NRT reader, and the user keeps calling
Hi Jake,
Thanks for your helpful explanation.
In fact, my initial solution was to traverse each document in the result
once and count the contained terms. As you mentioned, this process took a
lot of memory.
Trying to confine the memory usage with the facet approach, I was surprised
by the
On Mon, Oct 12, 2009 at 12:26 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Mon, Oct 12, 2009 at 3:17 PM, Jake Mannix jake.man...@gmail.com
wrote:
Wait, so according to the javadocs, the IndexReader which you got from
the IndexWriter forwards calls to reopen() back to
Wow! This is awesome. Can't wait to see how it plays with Bobo :)
On Sun, Oct 11, 2009 at 10:19 PM, John Wang john.w...@gmail.com wrote:
Hi guys:
The new FieldComparator api looks really scary :)
But after some perf testing with numbers I'd like to share, I guess it
is worth it:
HW:
I have documents that store multiple values in some fields (using the
document.add(new Field()) with the same field name). Here's what a
typical document looks like:
doc.option=value1 aaa
doc.option=value2 bbb
doc.option=value3 ccc
I want my queries to only match individual values, for
Oh, that is really good to know!
Is this deterministic? e.g. as long as writer.addDocument() is called, next
getReader reflects the change? Does it work with deletes? e.g.
writer.deleteDocuments()?
Thanks Mike for clarifying!
-John
On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless
Hi Eric,
To achieve what you want, do not tokenize the values you query/add to this
field.
On Mon, Oct 12, 2009 at 4:05 PM, Angel, Eric ean...@business.com wrote:
I have documents that store multiple values in some fields (using the
document.add(new Field()) with the same field name). Here's
Or else just make sure that you use PhraseQuery to hit this field when you
want value1 aaa. If you don't tokenize these pairs, then you will have to
do prefix/wildcard matching to hit just value1 by itself (if this is
allowed
by your business logic).
-jake
On Mon, Oct 12, 2009 at 1:21 PM,
Guys, please - you're not new at this... this is what JavaDoc is for:
/**
* Returns a readonly reader containing all
* current updates. Flush is called automatically. This
* provides near real-time searching, in that changes
* made during an IndexWriter session can be made
*
I need to analyze these values since I also want the benefits
porterStemmer. The problem with using PhraseQuery is that I don't
always know the slop. I may have values like value4 ddd aaa. It's a
tricky problem because I think Lucene sees all these values as one long
value for the field option.
Thanks Yonik,
It may be surprising, but in fact I have read that
javadoc. It talks about not needing to close the
writer, but doesn't specifically talk about the what
the relationship between commit() calls and
getReader() calls is. I suppose I should have
interpreted:
@returns a new reader
I think Lucene sees all these values as one long
value for the field option
Not quite. Starting with the second add, a call will be made to
getPositionIncrementGap in your analyzer. If you return a number
larger than one, then the offsets between the last term of the preceeding
add and the first
Hi Cedric,
There is a wiki page on NRT at:
http://wiki.apache.org/lucene-java/NearRealtimeSearch
Feel free tp ask questions if there's not enough information.
-J
On Mon, Oct 12, 2009 at 2:24 AM, melix cedric.champ...@lingway.com wrote:
Hi,
I'm going to replace an old reader/writer
Chris,
You could also store term vectors for all docs at indexing
time, and add the termvectors for the matching docs into a
(large) map of terms in RAM.
Regards,
Paul Elschot
On Monday 12 October 2009 21:30:48 Christoph Boosz wrote:
Hi Jake,
Thanks for your helpful explanation.
In fact,
I agree, the javadocs could be improved. How about something like
this for the first 2 paragraphs:
* Returns a readonly reader, covering all committed as
* well as un-committed changes to the index. This
* provides near real-time searching, in that changes
* made during an
On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix jake.man...@gmail.com wrote:
It may be surprising, but in fact I have read that
javadoc.
It was not your email I responded to.
It talks about not needing to close the
writer, but doesn't specifically talk about the what
the relationship between
That seems a lot more straightforward Mike, thanks.
-jake
On Mon, Oct 12, 2009 at 1:56 PM, Michael McCandless
luc...@mikemccandless.com wrote:
I agree, the javadocs could be improved. How about something like
this for the first 2 paragraphs:
* Returns a readonly reader, covering all
OK I just committed it -- thanks!
Mike
On Mon, Oct 12, 2009 at 5:01 PM, Jake Mannix jake.man...@gmail.com wrote:
That seems a lot more straightforward Mike, thanks.
-jake
On Mon, Oct 12, 2009 at 1:56 PM, Michael McCandless
luc...@mikemccandless.com wrote:
I agree, the javadocs could be
On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley yo...@lucidimagination.comwrote:
On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix jake.man...@gmail.com
wrote:
It may be surprising, but in fact I have read that
javadoc.
It was not your email I responded to.
Sorry, my bad then - you said guys
I think it was my email Yonik responded to and he is right, I was being lazy
and didn't read the javadoc very carefully.My bad.
Thanks for the javadoc change.
-John
On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley yo...@lucidimagination.comwrote:
On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix
Hi Paul,
Thanks for your suggestion. I will test it within the next few days.
However, due to memory limitations, it will only work if the number of hits
is small enough, am I right?
Chris
2009/10/12 Paul Elschot paul.elsc...@xs4all.nl
Chris,
You could also store term vectors for all docs
I still see some things we might want to document or explain:
We still need to be careful what the call to isCurrent()
will mean in the future for IndexReaders - as now there is another
kind of current - current even up to uncommitted changes.
Imagine the following set of IndexReaders floating
Ok, thanks for the details. I see I'm not the only one finding the javadoc
hard to understand. While this is well documented, it's still not clear
enough about the exact semantics of changes : at first I thought it
returned an IndexReader on the *uncommited changes only*, which meant it did
not
Good point on isCurrent - I think it should only be with respect to
the latest index commit point? and we should clarify that in the
javadoc.
[...]
// but what does the nrtReader say?
// it does not have access to the most recent commit
// state, as there's been a commit (with documents)
//
Erick,
Thank you. This is awesome. I got it to work by just setting slop to 1
and returning 10 in my analyzer.getPositionIncrementGap. Here are my
tests in case anyone else is interested:
public class TestPositionIncrementGap extends TestCase {
Analyzer analyzer = new
Hi,
I am trying to compute the counts of terms of the documents returned
by running a query using a TermVectorMapper.
I was wondering if anyone knew if there was a faster way to do this
rather than using a HashMap with a TermVectorMapper to store the
counts of the terms and calling
36 matches
Mail list logo