No documents can added into index when the index is optimizing, or
optimizing can't run durling documents adding to the index.
So, without other error, I think we can beleive the two index are indeed the
same.
:)
2008/9/4 Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED]
The use case is as follows
If you're on Windows, the safest way to do this in general, if there
is any possibility that readers are still using the index, is to
create a new IndexWriter with create=true. Windows does not let you
remove open files. IndexWriter will gracefully handle failed deletes
by retrying
YOU ARE FAST
thanks.
--Noble
On Thu, Sep 4, 2008 at 2:54 PM, Michael McCandless
[EMAIL PROTECTED] wrote:
Noble Paul നോബിള് नोब्ळ् wrote:
On Wed, Sep 3, 2008 at 2:06 PM, Michael McCandless
[EMAIL PROTECTED] wrote:
Noble Paul നോബിള് नोब्ळ् wrote:
On Tue, Sep 2, 2008 at 1:56 PM, Michael
Thanks for raising it!
It's through requests like this that Lucene's API improves.
Mike
Noble Paul നോബിള് नोब्ळ् wrote:
YOU ARE FAST
thanks.
--Noble
On Thu, Sep 4, 2008 at 2:54 PM, Michael McCandless
[EMAIL PROTECTED] wrote:
Noble Paul നോബിള് नोब्ळ् wrote:
On Wed, Sep 3, 2008 at
Agree with Michael McCandless!! By that way,it is handling gracefully.
2008/9/4 Michael McCandless [EMAIL PROTECTED]
If you're on Windows, the safest way to do this in general, if there is any
possibility that readers are still using the index, is to create a new
IndexWriter with
Hello,
This came up before but - if we were to make a swear word filter, string
edit distances are no good. for example words like `shot` is confused with
`shit`. there is also problem with words like hitchcock. appearently i need
something like soundex or double metaphone. the thing is - these
Hello Jason,
I have been trying to do this for a long time on my own. keep up the good
work.
What I tried was a document cache using apache collections. and before a
indexwrite/delete i would sync the cache with index.
I am waiting for lucene 2.4 to proceed. (query by delete)
Best.
On Wed, Sep
4 sep 2008 kl. 14.38 skrev Cam Bazz:
Hello,
This came up before but - if we were to make a swear word filter,
string
edit distances are no good. for example words like `shot` is
confused with
`shit`. there is also problem with words like hitchcock. appearently
i need
something like
4 sep 2008 kl. 15.54 skrev Cam Bazz:
yes, I already have a system for users reporting words. they fall on
an
operator screen and if operator approves, or if 3 other people
marked it as
curse, then it is filtered.
in the other thread you wrote:
I would create 1-5 ngram sized shingles and
Sorry, I should have said: you must always use the same writer, ie as
of 2.3, while IndexWriter.optimize (or normal segment merging) is
running, under one thread, another thread can use that *same* writer
to add/delete/update documents, and both are free to make changes to
the index.
let me rephrase the problem. I already have a set of bad words. I want to
avoid people inputting typos of the bad words.
for example 'shit' is banned, but someone may enter sh1t.
how can i flag those phonetically similar bad words to the marked bad words?
Best.
On Thu, Sep 4, 2008 at 5:02 PM,
hello,
I was reading the performance optimization guides then I found :
writer.setRAMBufferSizeMB()
combined with: writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH);
this can be used to flush automatically so if the ram buffer size is over a
certain limit it will flush.
now the
I see now, thanks Michael McCandless, good explain!!
2008/9/4, Michael McCandless [EMAIL PROTECTED]:
Sorry, I should have said: you must always use the same writer, ie as of
2.3, while IndexWriter.optimize (or normal segment merging) is running,
under one thread, another thread can use that
I submitted a patch to handle Aspell phonetic rules. You can find it in JIRA.
On Thu, 4 Sep 2008 17:07:09 +0300, Cam Bazz [EMAIL PROTECTED] wrote:
let me rephrase the problem. I already have a set of bad words. I want to
avoid people inputting typos of the bad words.
for example 'shit' is
hello,
anyone using ramdisks for storage? there is ramsam and there is also fusion
io. but they are kinda expensive. any other alternatives I wonder?
Best.
Hi all,
Thanks a lot for such a quick reply.
Both scenario sounds very well for me. I would like to do my best and try to
implement any of them (as the proof of the concept) and then incrementally
improve, retest, investigate and rewrite then :)
So, from the soap opera to the question part then:
Is there a way to turn on debug logging / trace logging for Lucene?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
We have some code that uses lucene which has been working perfectly well for
several months.
Recently, a QA team in our organization has set up a server with a much larger
data set than we have ever tested with in the past: the resulting lucene index
is about 3G in size.
On this particular
* And what's about visibility filter? * Are you sure no one else accesses
IndexReader and modifies index? See reader.maxDocs() to be confident.
On Fri, Sep 5, 2008 at 12:19 AM, Justin Grunau [EMAIL PROTECTED] wrote:
We have some code that uses lucene which has been working perfectly well
for
Sorry, I forgot to include the visibility filters:
final BooleanQuery visibilityFilter = new BooleanQuery();
visibilityFilter.add(new TermQuery(new Term(isPublic,
true)),
Occur.SHOULD);
visibilityFilter.add(new TermQuery(new
On Donnerstag, 4. September 2008, Justin Grunau wrote:
Is there a way to turn on debug logging / trace logging for Lucene?
You can use IndexWriter's setInfoStream(). Besides that, Lucene doesn't do
any logging AFAIK. Are you experiencing any problems that you want to
diagnose with debugging?
Anyway it is worth trying (to ensure docs aren't removed between searches).What
if running MatchAllDocsQuery or smth similar? Still getting different hits
count on query rerun?
PS. I'm kinda newbie with Lucene and Lucene API. So don't take my notes too
seriously :)
On Fri, Sep 5, 2008 at 12:46
Daniel, yes, please see my Problem with lucene search starting to return 0
hits when a few seconds earlier it was returning hundreds thread.
- Original Message
From: Daniel Naber [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, September 4, 2008 6:10:56 PM
Subject:
Op Thursday 04 September 2008 20:39:13 schreef Mark Miller:
Sounds like its more in line with what you are looking for. If I
remember correctly, the phrase query factors in the edit distance in
scoring, but the NearSpanQuery will just use the combined idf for
each of the terms in it, so
Indeed, StandardAnalyzer removing the pluses, so analyse 'c++' to 'c'.
QueryParser include Term that been analysed.
And BooleanQuery include Term that hasn't been analysed.
I think this is the difference between they.
2008/9/4 Ian Lea [EMAIL PROTECTED]
Have a look at the index with Luke to
Honestly: your problem doesn't sound like a Lucene problem to me at all
... i would write custom code to cehck your files for the pattern you are
looking for. if you find it *then* construct a Document object, and add
your 3 fields. I probably wouldn't even use an analyzer.
-Hoss
The Javadoc for this method has the following comment:
This requires this index not be among those to be added, and the upper bound*
of those segment doc counts not exceed maxMergeDocs.
What does the second part of that mean, which is especially confusing given that
MAX_MERGE_DOCS is
: Now, I would like to to access to the best fragments offsetsfrom each
: document (hits.doc(i)).
I seem to recall that the recomended method for doing this is to subclass
your favorite Formatter and record the information from each TokenGroup
before delegating to the super class.
but there
I am creating several temporary batches of indexes to separate indices and
periodically will merge those batches to a set of master indices. I'm using
IndexWriter#addIndexesNoOptimise(), but problem that gives me is that the master
may already contain the index for that document and I get a
29 matches
Mail list logo