On Aug 13, 2013, at 12:55 PM, Michael McCandless wrote:
I'm less familiar with the older highlighters but likely it's possible
to get the absolute offsets from them as well.
Using vector highlighter I've achieved that by extending and cloning the code
of
Non technical users understand what a field is. All of them might however not
know that they they can use them but It's easy for them to learn that name:john
will search for john only in names.
Non technical users can learn to understand that logic and functionality can be
specified in their
22 maj 2013 kl. 20:29 skrev Petite Abeille:
On May 22, 2013, at 7:08 PM, Karl Wettin karl.wet...@kodapan.se wrote:
* Use a filter after ASCIIFoldingFilter that discriminate all use of ae,
oe, oo, and other combination of double vowels, just keeping the first one.
I ended up
This is a question (or perhaps a line of thought) regarding the mutually
intelligible Scandinavian languages Danish, Norwegian and Swedish.
The Swedish letters åäö is in fact the same letters as the Danish/Norwegian
åæø. A Norwegian writing about the Swedish city of Göteborg write Gøteborg and
22 maj 2013 kl. 14:37 skrev Karl Wettin:
* Use a filter after ASCIIFoldingFilter that discriminate all use of ae, oe,
oo, and other combination of double vowels, just keeping the first one.
I ended up with that solution.
https://issues.apache.org/jira/browse/LUCENE-5013
The most simple solution is to use of slop in PhraseQuery, SpanNearQuery,
etc(?). Also consider permutations of #isInOrder() with alternative query
boosts.
Even though slop will create a greater score the closer the terms are, it might
still in some cases (usually when combined with other
something
like your proximity query~20, but consider the cost of a great slop.
4 maj 2013 kl. 20:41 skrev Karl Wettin:
The most simple solution is to use of slop in PhraseQuery, SpanNearQuery,
etc(?). Also consider permutations of #isInOrder() with alternative query
boosts.
Even though
14 jan 2013 kl. 14:53 skrev VIGNESH S:
Anyone Used the Naive Bayesian Classifier?
It will be really helpful if some one Can post how to use the
classifiers in Lucene ..
Hi there,
I posted a NB classifier in the jira back in 2007 that use Lucene as data
matrix. It probably needs a bit of
22 aug 2011 kl. 18.49 skrev Rich Cariens:
I found a Lucene SSD performance benchmark
dochttp://wiki.apache.org/lucene-java/SSD_performance?action=AttachFiledo=viewtarget=combined-disk-ssd.pdfbut
the wiki engine is refusing to let me view the attachment (I get You
are not allowed to do
You'll also need things to exclude from, eg a MatchAllDocsQuery.
karl
29 jun 2011 kl. 17.25 skrev Clemens Wyss:
Say I have a document with field f1. How can I search Documents which have
not test in field f
I tried:
-f: *test*
f: -*test*
f: NOT *test*
but no luck. Using
Perhaps least frequent substring or even suffix truncation might be enough
for your needs.
Here is a related paper: http://web.jhu.edu/bin/q/b/p75-mcnamee.pdf
karl
On Jun 8, 2011, at 1:52 PM, Mohamed Yahya wrote:
You're right. Still, I am not sure if there is a library that would
On Jan 18, 2011, at 10:04 PM, Grant Ingersoll wrote:
As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really
don't have a good sense of how people get Lucene and Solr for use in their
application. Because of this, there has been some talk of dropping Maven
support for
22 nov 2010 kl. 10.56 skrev jan.kure...@nokia.com jan.kure...@nokia.com:
Using the SearchHandler with the deftype=”dismax” option enables the
DisMaxQParserPlugin. From investigating it seems, it is just tokenizing by
whitespace.
Although by looking in the code I could not find the place,
There is a SpanFuzzyQuery for Lucene 1.9 from 2006 in LUCENE-522.
karl
27 sep 2010 kl. 00.19 skrev Fabiano Nunes:
Thank you, Schindler.
When combining queries, I need two strings, one for each field. I want to
use just one string like -- head:hello~ world~3 AND contents:colorless~
is a litter term is very frequent and
other term is very rare.
2010/8/27 Karl Wettin karl.wet...@gmail.com:
My mail client died while sending this mail.. Sorry for any
duplicate.
It is strange that it should take 20 second to gather fields, this
is the
only thing that really suprises me. I'd
My mail client died while sending this mail.. Sorry for any duplicate.
It is strange that it should take 20 second to gather fields, this is
the only thing that really suprises me. I'd expect it to be instant
compared to RAMDirectory. It is hard to say from the information you
provided.
Hi,
Please define important. Important to do what?
It would probably be helpful if you explained what it is you attempt
to achieve by doing this. Perhaps there is something in MoreLikeThis
that will help you?
karl
23 jul 2010 kl. 04.44 skrev Xaida:
Hi all!
hmmm, i need to
23 jul 2010 kl. 08.30 skrev sk...@sloan.mit.edu:
Hi all, I have an interesting problem...instead of going from a query
to a document collection, is it possible to come up with the best fit
query for a given document collection (results)? Best fit being a
query which maximizes the hit scores of
Are you perhaps looking for this:
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/similar/MoreLikeThis.html
?
karl
23 jul 2010 kl. 10.54 skrev Xaida:
Hi! thanks for reply! I will try to explain better, sorry if it was
unclear.
I have user text document
2 jul 2010 kl. 08.32 skrev Li Li:
I have an index of
about 8,000,000 document and the current index size is about 30GB. Is
it possbile to use this contrib to speed up my search? I have enough
memory for it.
In order to answer your question you'll need to benchmark using a lot
of typical
NFS to EMC Celera devices. (NFS 3)
- The drives are 300 gb fiber attached with 10,000 rpm.
Thanks,
Ivan
--- On Thu, 4/8/10, Karl Wettin karl.wet...@gmail.com wrote:
From: Karl Wettin karl.wet...@gmail.com
Subject: Re: Lucene Partition Size
To: java-user@lucene.apache.org
Date: Thursday, April 8
8 apr 2010 kl. 20.05 skrev Ivan Provalov:
We are using Lucene for searching of 200+ mln documents (periodical
publications). Is there any limitation on the size of the Lucene
index (file size, number of docs, etc...)?
The only such limitation in Lucene I'm aware of is Integer.MAX_VALUE
1 apr 2010 kl. 11.21 skrev suman.hol...@zapak.co.in suman.hol...@zapak.co.in
:
its written to do a search within search, so that the second
search is
constrained by the results of the first query
If I understand your needs you could while collecting search results
populate a new filter
31 mar 2010 kl. 10.21 skrev Michael Stoppelman:
I was wondering why the InstantiatedIndex gets very slow as the
number of
documents increases in the index. I've been looking at the source
and have
only found comments saying it's slow when the index is big but not
why. Do
folks just run
20 jan 2010 kl. 04.58 skrev Guido Bartolucci:
Am I just ignorant and scared of Lucene and too trusting of Oracle
and MySQL?
Since all your comparations is with relational databases I feel
obligated to say what has been said so many times on this list:
Lucene is an index and not a
Lucene will probably only be helpful if you know what you are looking
for, e.g. that you search for a given person, a given street and given
time intervals.
Is this what you want to do?
If you instead are looking for a way to really extract any person,
street and time interval that a
Have you tried antiword?
http://www.winfield.demon.nl/
karl
11 jan 2010 kl. 21.04 skrev maxSchlein:
I was looking for an option for Text extraction from a word doc.
Currently I am using POI; however, when there is a table in the doc,
for
each column POI brings back a . The
3 jan 2010 kl. 13.33 skrev luocanrao:
1、if the readers do not call re-open, segment file the readers will
see is
after merged or before merged when optimize() done
2、when old segment file on disk is removed,if old segment files are
removed
after optimize() done at once,
How can the
3 jan 2010 kl. 16.32 skrev Yonik Seeley:
Perhaps this is just a huge index, and not enough of it can be
cached in RAM.
Adding additional clauses to a boolean query incrementally destroys
locality.
104GB of index and 4GB of RAM means you're going to be hitting the
disk constantly. You
31 dec 2009 kl. 02.19 skrev Erick Erickson:
It is possible to reconstruct a document from the terms, but
it's a lossy process. Luke does this (you can see from the
UI, and the code is available). There's no utility that I know
of to make this easy.
https://issues.apache.org/jira/browse/LUCENE-2144
9 dec 2009 kl. 23.22 skrev Uwe Schindler:
This is a bug in InstantiatedIndex. The termDoc(null) was added to
get all
documents. This was never implemented in Instantiated Index. Can you
open an
issue?
There maybe other queries fail because
29 okt 2009 kl. 12.12 skrev m.harig:
i've a doubt in search , i've a word in my index welcomelucene
(without
spaces) , when i search for welcome lucene(with a space) , am not
able to
get the hits. It should pick the document welcomelucene.. is there
anyway to
do it ? i've used
22 okt 2009 kl. 20.00 skrev Chris Hostetter:
: I'm thinking a decorator with deletions on top of the original
reader, merged
: with the clone reader using a MultiReader. But this would still
require a new
you don't really mean a clone do you? ... you should just need a very
small index
Hi people,
I have an application in which the users are allowed to make changes
to the database, changes visible only to that user. I.e. they don't
modify the original data, they create a clone of the original. When
the user request the instance I retrieve the modified clone rather
than
Hi,
you should probably ask your self why your performance is bad before
looking at solving it by scaling hardware. I.e. what are your
application needs, how so you solve you needs at index/query time and
how can you replace this with something better? If you tell us a bit
more about
14 okt 2009 kl. 15.15 skrev Grant Ingersoll:
On Oct 12, 2009, at 10:46 PM, Thomas D'Silva wrote:
I am trying to compute the counts of terms of the documents
returned by running a query using a TermVectorMapper.
I was wondering if anyone knew if there was a faster way to do this
rather
For the case where the text contains mixed languages there are
solutions that simutainously use morphological rules of two or more
languages. Coveo search does this but I don't know what their solution
looks like. I suppose one way to do it would be to stem all tokens
with all algorithms
Hi Andrew,
I think you are looking for the shingle package in contrib/analyzers.
karl
6 okt 2009 kl. 13.42 skrev Andrew Zhang:
Hi guys,
The requirement is very simple here, e.g. for this sentence, 'The NBA
formally announced its new *social media* guidelines Wednesday', I
want to
6 okt 2009 kl. 18.54 skrev David Causse:
David, your timing couldn't be better. Just the other day I proposed
that we deprecate InstantiatedIndexWriter. The sum of the reasons to
this is that I'm a bit lazy. Your mail makes me reconsider.
https://issues.apache.org/jira/browse/LUCENE-1948
enough.
Regards,
Andrew
On Tue, Oct 6, 2009 at 11:51 PM, Karl Wettin karl.wet...@gmail.com
wrote:
Hi Andrew,
I think you are looking for the shingle package in contrib/analyzers.
karl
6 okt 2009 kl. 13.42 skrev Andrew Zhang:
Hi guys,
The requirement is very simple here, e.g
Hi Ole-Martin,
how many characters was it in the url in before and after update?
karl
5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk:
Hi. I am trying to understand Lucene's scoring algorithm. We're
getting some strange results. First we search for a given page by it's
url. We get this
sorry, I ment title.
5 okt 2009 kl. 11.57 skrev Simon Willnauer:
Ole-Martin, did you mention that you did not change the URL value
but the
title?
simon
On Mon, Oct 5, 2009 at 11:52 AM, Karl Wettin karl.wet...@gmail.com
wrote:
Hi Ole-Martin,
how many characters was it in the url
of the title was increased by
1, from
41 to 42 characters.
--
Ole-Martin Mørk
On Mon, Oct 5, 2009 at 12:39 PM, Karl Wettin karl.wet...@gmail.com
wrote:
sorry, I ment title.
5 okt 2009 kl. 11.57 skrev Simon Willnauer:
Ole-Martin, did you mention that you did not change the URL value
Use a span near query to add boost for the phrases. If you only want
to add boost for exact phrases (0 slop) you might want to consider
using shingles.
In order to add greater score for a date closer in time you can choose
between a range of solutions depending on your needs. Using a
Not quite sure what you ask for, but I think you want to use a span
near query (for adding boost to phrases) in a disjunction max query
(to define weights of the different fields).
karl
1 okt 2009 kl. 02.40 skrev mitu2009:
Hi,
I've 3 records in Lucene index.
Record 1 contains
You could look in to modifying the standard tokenizer lexer code to
handle punctuation (there is a patch in the isssue tracker for the old
javacc grammer to handle punctuation) and there is also the Gate NLP
project which has a fairly nice sentence splitter you might find
useful. Add a
23 sep 2009 kl. 17.55 skrev Mindaugas Žakšauskas:
Luke says:
Has deletions? / Optimized? Yes (1614) / No
Very quick response, try optimizing your index and see what happends.
I'll get back to you unless someone beats me to it.
karl
23 sep 2009 kl. 17.55 skrev Mindaugas Žakšauskas:
I was kind of hinting on the resource planning. Every decent
enterprise application, apart from other things, has to provide its
memory requirements, and my point was - if it uses memory, how much of
it needs to be allocated? What are the
Hi Mindaugas,
it is - as you sort of point out - the readers associated with your
searcher that consumes the memory, and not so much the searcher it
self. Thing that consume the most memory is probably field norms (8
bits per field and document unless omitted) and flyweighted terms
28 maj 2009 kl. 12.22 skrev Gaurav Kumar:
Hi everyone,
I am doing a project using Lucene where i need to index HTML files.
I am
using Tika to parse HTML files. But i need to index files according
to their
tags which means that every text present in different HTML tag (like
p
a) should
Hi Jeetu,
wether or not it makes sense to use Lucene as your data matrix depends
a bit on your requirements. There is a Bayesian classifier available
in the issue tracker http://issues.apache.org/jira/browse/
LUCENE-1039 that might be helpful, although it does need a little bit
of
Hi Ravichandra,
this is a question better fitted the java-users maillinglist. On this
list we talk about the development of the Lucene API rather than how
to use it.
To answer your question, there is no simple formula that says how much
RAM an InstantiatedIndex will consume given the
. :)
Thanks!
-Nate
On Thu, May 7, 2009 at 7:50 AM, Karl Wettin karl.wet...@gmail.com
wrote:
Nate,
will there always be a correspodning mp3 for any given note sheet?
As for analysis, I'd try using ngrams of the complete untokenized
file name
if I was you.
Michael Jackson Don't Stop 'till You Get
SpellChecker classes
be
of any use?
I really feel like I'm floundering here. I am more than willing to
put
in the work, I just need a push or two in the right directions. :)
Thanks!
-Nate
On Thu, May 7, 2009 at 7:50 AM, Karl Wettin karl.wet...@gmail.com
wrote:
Nate,
will there always
I might be missing something here, but why not just store the index on
a cryptographic virtual file system?
karl
8 maj 2009 kl. 19.09 skrev peter_lena...@ibi.com peter_lena...@ibi.com
:
Michael,
Thanks for the comments they are very insightful.
I hadn't thought about the Random
Nate,
will there always be a correspodning mp3 for any given note sheet?
As for analysis, I'd try using ngrams of the complete untokenized file
name if I was you.
Michael Jackson Don't Stop 'till You Get Enough -
^mic, mich, icha, chae, hael, ael , el j, l ja, and so
on.
See
You should probably tell us the reason to why you need this
functionallity.
Given you only load the stored comparative field for the first it
doesn't really have to be that expensive. If you know that the first
hit was not a perfect match then you know that any matching documents
with a
For this you probably want to use ngrams. Wether or not this is
something that fits in your current index is hard to say. My guess is
that you want to create a new index with one document per unique
phrase. You might also want to try to load this index in an
InstantiatedIndex, that could
If you use prefix grams only then you'll get a forward-only suggestion
scheme. I've seen several implementation that use that and it works
quite well.
harry potter: ^ha, ^har, ^harr, ^harry, ^harry p, ^harry po..
harry houdini: ^ha, ^har, ^harr, ^harry, ^harry h, ^harry ho..
I prefere the
6 apr 2009 kl. 14.59 skrev Glyn Darkin:
Hi Glyn,
to be able to spell check phrases
E.g
Harry Poter is converted to Harry Potter
We have a fixed dataset so can build indexes/ dictionaries from our
own data.
the most obvious solution is index your contrib/spell checker with
shingles. This
6 apr 2009 kl. 15.47 skrev Lebiram:
I am thinking of adding search filters to my application thinking
that they would more efficient.
Can anyone explain what lucene does with search filters?
Like, what generally happens when calling search()
A filter is a bitset, one bit per document in
You can also look at https://issues.apache.org/jira/browse/LUCENE-1039
that I've successfully used for language detection of user queries.
karl
27 mar 2009 kl. 18.35 skrev Boris Aleksandrovsky:
Lisheng,
You might want to look at the Nutch LanguageID plugin
There is even an old thread about this on the Mahout-users list:
http://markmail.org/message/ludu5hjfczuvgk3n
17 mar 2009 kl. 15.17 skrev Grant Ingersoll:
Have a look at the Lucene sister project: Mahout: http://lucene.apache.org/mahout
. In there is the Taste collaborative filtering
15 feb 2009 kl. 16.27 skrev Joel Halbert:
Is there any practical limit on the number of fields that can be
maintained on an index?
My index looks something like this, 1 million documents. For each
group
of 1000 documents I might have 10 indexed fields. This would mean in
total about 1
?
Karl Wettin wrote:
If you attach an NgramTokenFilter to your analyzer at index and
query time you should be able to query for parts of the word.
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
http://lucene.apache.org/java/2_4_0/api/index.html?org
Hi again Jori,
did you try N-grams as suggested in the reply on -dev?
karl
13 feb 2009 kl. 09.05 skrev d-fader:
Hi,
I've actually posted this message in de dev mailing list earlier,
because I though my 'issue' is a limitation of the functionality of
Lucene, but they redirected me to
this :)
Jori.
Karl Wettin wrote:
Hi again Jori,
did you try N-grams as suggested in the reply on -dev?
karl
13 feb 2009 kl. 09.05 skrev d-fader:
Hi,
I've actually posted this message in de dev mailing list earlier,
because I though my 'issue' is a limitation of the functionality
5 feb 2009 kl. 14.44 skrev Lebiram:
If HitCollector only returns a document once then he might be
referring to an application ID that is assigned to a field that has
been indexed twice or more with different document IDs.
I'll clarify this with him.
However is there a way to somehow do a
5 feb 2009 kl. 09.30 skrev Amin Mohammed-Coleman:
Is there a seperate part in the lucene document that the tokenised
strings
are stored and therefore Lucene knows where to look?
Yes.
Stored fields is meta data bound to a document, for instance the
primary key of the object the Lucene
Hi Eric,
ShingleMatrixFilter does not add some sort of multiple token synonym
feature on top of a plain old Lucene index, it does however create
permutations of tokens in a matrix. My suggestion is that you first
look at what shingles are and make sure this is something you feel is
I think it would be nice with little payload modification tool in the
SVN.
karl
2 jan 2009 kl. 23.02 skrev Grant Ingersoll:
I don't think there is any API support for this, but in theory it is
possible, as long as you aren't changing the size. It sounds like
it could work for you
Hello,
the easiest way would be to construct the combined document using the
data from your primary source rather than reconstructing it from the
index. If the source data no longer is available you could still
reconstruct a token stream. The data is however a bit spread out so it
can
30 dec 2008 kl. 17.13 skrev Lebiram:
Hi Lebiram,
contrib/misc contains a couple of tools that might be of help.
Just wanted to reconstruct a new index based on an existing
index(but turning off norms) that's all.
If you want to create an identical index but without norms use
Hi Israel,
you can solve your problem at search time by passing a custom
Similarity class that looks something like this:
private Similarity similarity = new DefaultSimilarity() {
public float tf(float v) {
return 1f;
}
public float tf(int i) {
return 1f;
}
I would very much like to hear how people use payloads.
Personally I use them for weight only. And I use them a lot, almost in
all applications. I factor the weight of synonyms, stems,
dediacritization and what not. I create huge indices that contains
lots tokens at the same position but
13 dec 2008 kl. 06.05 skrev Aaron Schon:
Hi , if I have a Lucene index (or Solr) that is installed in client
premises. how would you go about securing the index from being
queries in unauthorized fashion. For example, from malicious users
or hackers, or for that matter internal users
Hi Tim,
is it possible that the slow queries contains terms that are very
common in your index? If so you could replace those clauses with a
filter. This would impact the score as filters does nothing with that,
but if your query contains enough other clauses that should not be a
Hello Anees,
the Gdata server was phased out by 2.3. You can still get if from the
2.2 tag in the SVN:
http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_2_0/
karl
5 dec 2008 kl. 07.13 skrev Anees Haider:
I have setup lucene, test run it and go through samples.
Now I have been
You could get the 2.4 code and set the serialVersionUID of the Term
class to the UID assigned to the 2.3 Term class (554776219862331599l)
and recompile.
As for statically setting a serialVersionUID in the class, one could
instead set it to a final value and implement Externalizable in
by sequence, but
it cant
search as a startswith (for library inf*)
Karl Wettin wrote:
SpanTermQuery is a TermQuery and not a WildcardQuery. You could use a
SpanRegexQuery. You could also make your own SpanWildcardQuery based
on either WildcardQuery or SpanRegexQuery.
You should probably tell
27 nov 2008 kl. 10.15 skrev Toke Eskildsen:
On Thu, 2008-11-27 at 07:30 +0100, Karl Wettin wrote:
The most scary part is that that you will have to score each and
every
document that has a source, probably all of the documents in your
corpus.
I now see my query-logic was flawed. In order
SpanTermQuery is a TermQuery and not a WildcardQuery. You could use a
SpanRegexQuery. You could also make your own SpanWildcardQuery based
on either WildcardQuery or SpanRegexQuery.
You should probably tell us a bit about the problem you try to solve
rather than asking about the solution
The most scary part is that that you will have to score each and every
document that has a source, probably all of the documents in your
corpus. So if you have a very large number of documents it might be a
bit expensive. Also, appending this query for boost only means that
you will get
Alex,
if you have length normalization turned on then the length (the number
of tokens and perhaps even the distance between the tokens) of the
second document is much greater than the length of the first document.
The length is the complete number of tokens in the field, i.e. if you
add
Hi David,
thanks for the report! I suppose you speak of IndexWriter vs
InstantiatedIndexWriter? These are definitely considered discrepancy
problems. I've created a new issue in the tracker:
http://issues.apache.org/jira/browse/LUCENE-1462
For what reason do you try to serialize the
The actual performance depends on how much you load to the index. Can
you tell us how many documents and how large these documents are that
you have in your index?
Compared with RAMDirectory I'vee seen performance boosts of
up to 100x in a small index that contains (1-20) Wikipedia sized
On Wed, Nov 19, 2008 at 3:27 AM, karl wettin [EMAIL PROTECTED] wrote:
rewritten query. I.e. this is probably as much a store related expense
as it is a Levenshtein calculation expense.
this is probably *not* as much a store related.. that is.
karl
Hi Darren,
How large is your corpus? The speed you can expect depends on how much
data you load it with. There is a graph in the package level javadocs
that shows this:
http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/org/apache/lucene/store/instantiated/package-summary.html
2 okt 2008 kl. 14.47 skrev Jimi Hullegård:
But apparently this setOmitNorms(true) also disables boosting
aswell. That is ok for now, but what if we want to use boosting in
the future? Is there no way to disable the length normalization
while still keeping the boost calculation?
You can
24 sep 2008 kl. 12.40 skrev Grant Ingersoll:
One side note based on your example, below: Index time boosting
does not have much granularity (only 255 values), in other words,
there is a loss of precision. Thus, you
want to make sure your boosts are different enough such that you can
19 sep 2008 kl. 11.05 skrev 叶双明:
Documentstored/uncompressed,indexedfield:abc
Documentstored/uncompressed,indexedfield:bcd
How can I get the first Document buy some query string like a ,
ab or
abc but no b and bc?
You would create an ngram filter that create grams from the first
Related, I've been considering filesystem based filters on SSD. That
ought to be rather fast, consume no memory and be as simple as a
RandomAccessFile. I didn't spend to much time on it, gave up when I
couldn't figure out when it made sense to close the file. Perhaps it
would be nice with
15 sep 2008 kl. 14.08 skrev Dragan Jotanovic:
I made simple Similarity implementation:
public float tf(float arg0) {
return 1f;
}
Why do you touch the term frequency? Is that prehaps unrelated to
what's discussed in this thread?
karl
15 sep 2008 kl. 18.45 skrev Cam Bazz:
I have been looking at instantiated index in the trunk. Does this come
with a searcher?
Pass an InstantiatedIndexReader to the constructor of an IndexSearcher.
Are the adds reflected directly to the index?
Yes. An InstantiatedIndexReader is always
15 sep 2008 kl. 18.51 skrev Karl Wettin:
Are the adds reflected directly to the index?
Yes. An InstantiatedIndexReader is always current.
You will probably still have to reconstruct your searcher.
I never really looked in to what happends if you don't.
The second statement was wrong
Hi Wojciech,
can you please give us a bit more specific information about the meta
data fields that will change? I would recommend you looking at
creating filters from your primary persistency for query clauses such
as unread/read, mailbox folders, et c.
karl
12 sep 2008 kl. 13.57
12 sep 2008 kl. 12.25 skrev Bogdan Ghidireac:
I have a large index and I want to remove the norms from a field. Is
there a way to do this without reindexing everything ?
You could invoke IndexReader#setNorm(int, String, float) and set the
value to 1f.
karl
12 sep 2008 kl. 14.51 skrev Wojciech Strzałka:
The most changing fields will be I think:
Status (read/unread): in fact I'm affraid of this at most - any
mail incoming to the system will need to be
indexed at least twice
This is why I recommended you to use a
with
frequently changing fields.
Karl Wettin wrote:
Hi Wojciech,
can you please give us a bit more specific information about the
meta data fields that will change? I would recommend you looking at
creating filters from your primary persistency for query clauses
such as unread/read, mailbox
4 sep 2008 kl. 14.38 skrev Cam Bazz:
Hello,
This came up before but - if we were to make a swear word filter,
string
edit distances are no good. for example words like `shot` is
confused with
`shit`. there is also problem with words like hitchcock. appearently
i need
something like
1 - 100 of 553 matches
Mail list logo