Hi all.
I discovered there is a normalise filter now, using ICU's Normalizer2
(org.apache.lucene.analysis.icu.ICUNormalizer2Filter). However, as
this is a filter, various problems can result if used with
StandardTokenizer.
One in particular is half-width Katakana.
Supposing you start out with
On Mon, Jan 17, 2011 at 11:53 AM, Robert Muir rcm...@gmail.com wrote:
On Sun, Jan 16, 2011 at 7:37 PM, Trejkaz trej...@trypticon.org wrote:
So I guess I have two questions:
1. Is there some way to do filtering to the text before
tokenisation without upsetting the offsets reported
On Thu, Jan 20, 2011 at 9:08 AM, Paul Libbrecht p...@hoplahup.net wrote:
Wouldn't it be better to prefer precise matches (a field that is
analyzed with StandardAnalyzer for example) but also allow matches are
stemmed.
StandardAnalyzer isn't quite precise, is it? StandardFilter does some
On Fri, Mar 11, 2011 at 10:03 PM, shrinath.m shrinat...@webyog.com wrote:
I am trying to index content withing certain HTML tags, how do I index it ?
Which is the best parser/tokenizer available to do this ?
This doesn't really answer the question, but I think it will help...
The features you
Hi all.
I'm trying to parallelise writing documents into an index. Let's set
aside the fact that 3.1 is much better at this than 3.0.x... but I'm
using 3.0.3.
One of the things I need to know is the doc ID of each document added
so that we can add them into auxiliary database tables which are
On Tue, Mar 29, 2011 at 11:21 PM, Erick Erickson
erickerick...@gmail.com wrote:
I'm always skeptical of storing the doc IDs since they can
change out from underneath you (just delete even a single
document and optimize).
We never delete documents. Even when a feature request came in to
update
On Sat, Apr 2, 2011 at 7:07 AM, Christopher Condit con...@sdsc.edu wrote:
I see in the JavaDoc for IndexWriterConfig that:
Note that IndexWriter makes a private clone; if you need to
subsequently change settings use IndexWriter.getConfig().
However when I attempt to use the same
On Thu, Apr 14, 2011 at 9:44 PM, shrinath.m shrinat...@webyog.com wrote:
Consider this case :
Lucene index contains documents with these fields :
title
author
publisher
I have coded my app to use MultiFieldQueryParser so that it queries all
fields.
Now if user types something like
On Thu, Apr 28, 2011 at 6:13 PM, Uwe Schindler u...@thetaphi.de wrote:
In general a *newly* created object that was not yet seen by any other
thread is always safe. This is why I said, set all bits in the ctor. This is
easy to understand: Before the ctor returns, the object's contents and all
On Wed, Jun 8, 2011 at 6:52 PM, Elmer evanchaste...@gmail.com wrote:
the parsed query becomes:
'+(title:the) +(title:project desc:project)'.
So, the problem is that docs that have the term 'the' only appearing in
their desc field are excluded from the results.
Subclass MFQP and override
On Wed, Jun 29, 2011 at 2:24 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Here's the issue:
https://issues.apache.org/jira/browse/LUCENE-3255
It's because we read the first 0 int to be an ancient segments file
format, and the next 0 int to mean there are no segments. Yuck!
Hi all.
I created a test using Lucene 2.3. When run, this generates a single token:
public static void main(String[] args) throws Exception {
String string =
\u0412\u0430\u0441\u0438\u0301\u043B\u044C\u0435\u0432;
StandardAnalyzer analyser = new StandardAnalyzer();
On Fri, Jul 15, 2011 at 10:02 AM, Trieu, Jason T
trieu.ja...@con-way.com wrote:
Hi all,
I read postings about searching for empty field with but did not find any
cases of successful search using query language syntax itself(-myField:[* TO
*] for example).
We have been using: -myField:*
On Fri, Jul 15, 2011 at 4:45 PM, Uwe Schindler u...@thetaphi.de wrote:
Hi,
The crappy thing is that to actually detect if there are any tokens in the
field
you need to make a TokenStream which can be used to read the first token
and then rewind again. I'm not sure if there is such a thing
Hi all.
I am writing a custom query parser which strongly resembles
StandardQueryParser (I use a lot of the same processors and builders,
with a slightly customised config handler and a completely new syntax
parser written as an ANTLR grammar.) My parser has additional syntax
for span queries.
On Fri, Aug 5, 2011 at 1:57 AM, Jim Swainston
jimswains...@googlemail.com wrote:
So if the Text input is:
Marketing AND Smith OR Davies
I want my program to work out that this should be grouped as the following
(as AND has higher precedence than OR):
(Marketing AND Smith) OR Davies.
I'm
On Mon, Aug 8, 2011 at 8:58 AM, Michael Sokolov soko...@ifactory.com wrote:
Can you do something approximately equivalent like:
within(5, 'my', and('cat', 'dog')) -
within(5, 'my', within(5, 'cat', 'dog') )
Might not be exactly the same in terms of distances (eg cat x x x my x x x
dog)
On Mon, Aug 8, 2011 at 10:00 AM, Trejkaz trej...@trypticon.org wrote:
within(5, 'my', and('cat', 'dog')) - within(5, 'my', within(10, 'cat',
'dog') )
To extend my example and maybe make it a bit more hellish, take this one:
within(2, A, and(B, or(C, and(D, E
After rewriting both
Hi all.
Suppose I am searching for - 限定
In 3.0, QueryParser would parse this as a phrase query. In 3.3, it
parses it as a boolean query, but offers an option to treat it like a
phrase. Why would the default be not to do this? Surely you would
always want it to become a phrase query.
The new
On Fri, Aug 19, 2011 at 11:05 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:
See LUCENE-2458 for the backstory.
the argument was that while phrase queries were historicly generated by
the query parser when a single (white space deliminated) chunk of query
parser input produced multiple
On Sat, Aug 20, 2011 at 7:00 PM, Robert Muir rcm...@gmail.com wrote:
On Sat, Aug 20, 2011 at 3:34 AM, Trejkaz trej...@trypticon.org wrote:
As an aside, Google's behaviour seems to follow the old way. For
instance, [[ 限定 ]] returns 640,000,000 hits and [[ 限 定 ]] returns
772,000,000
Hi all.
We are using IndexWriter with no limits set and managing the commits
ourselves, mainly so that we can ensure they are done at the same time
as other (non-Lucene) commits.
After upgrading from 3.0 ~ 3.3, we are seeing a change in
ramSizeInBytes() behaviour where it is no longer resetting
On Wed, Aug 24, 2011 at 4:45 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Hmm... this looks like a side-effect of LUCENE-2680, which was merged
back from trunk to 3.1.
So the problem is, IW recycles the RAM it has allocated, and so this
method is returning the allocated RAM, even
On Sat, Aug 27, 2011 at 2:30 AM, ikoelli...@daegis.com wrote:
Hello,
In our indexes we have a field that is a combination of other various
metadata fields (i.e. subject, from, to, etc.). Each field that is added has
a null position at the beginning. As an example, in Luke the field data
On Mon, Sep 19, 2011 at 3:50 AM, Charlie Hubbard
charlie.hubb...@gmail.com wrote:
Here was the prior API I was calling:
Hits hits = getSearcher().search( query, filter, sort );
The new API:
TopDocs hits = getSearcher().search( query, filter, startDoc +
length, sort );
So
Supposing I have a document with just hi there as the text.
If I do a span query like this:
near(near(term('hi'), term('there'), slop=0, forwards),
term('hi'), slop=1, any-direction)
that returns no hits. However, if I do a span query like this:
near(near(term('hi'), term('there'),
On Mon, Dec 19, 2011 at 9:05 PM, Paul Taylor paul_t...@fastmail.fm wrote:
I was looking for a Query that returns all documents that contain a
particular field, it doesnt matter what the value of the field is just that
the document contains the field.
If you don't care about performance (or if
Hi all.
I want to access a Lucene index remotely. I'm aware of a couple of
options for it which seem to operate more or less at the IndexSearcher
level - send a query, get back results.
But in our case, we use IndexReader directly for building statistics,
which is too slow to do via individual
On Mon, Jan 23, 2012 at 11:31 PM, Jamie ja...@stimulussoft.com wrote:
Ian
Thanks. I'll have to read up about it. I have a lot of comparisons to make,
so cannot precompute the values.
How many is a lot? If it were 100 or so I would still be tempted to do
all 4,950 comparisons and find some
On Fri, Jan 27, 2012 at 10:41 AM, Saurabh Gokhale
saurabhgokh...@gmail.com wrote:
I wanted to check if Ngraming the document contents (space is not the
issue) would make any good for better matching? Currently I see Ngram is
mostly use for auto complete or spell checker but is this useful for
Hi all.
I've found a rather frustrating issue which I can't seem to get to the
bottom of.
Our application will crash with an access violation around the time
when the index is closed, with various indications of what's on the
stack, but the common things being SegmentTermEnum.next and
On Wed, Feb 1, 2012 at 11:30 AM, Robert Muir rcm...@gmail.com wrote:
the problem is caused by searching indexreaders after you closed them.
in general we can try to add more and more safety, but at the end of the day,
if you close an indexreader while a search is running, you will have
On Wed, Feb 1, 2012 at 1:14 PM, Robert Muir rcm...@gmail.com wrote:
No, I don't think you should use close at all, because your problem is
you are calling close() when its unsafe to do so (you still have other
threads that try to search the reader after you closed it).
Instead of trying to
Hi all.
We have 1..N indexes for each time someone adds some data. Each time
they can choose different tokenisation settings. Because of this, each
text index has its own query parser instance. Because each query
parser could generate a different Query (though I guess whether they
do or not is
On Wed, Feb 15, 2012 at 11:46 AM, Uwe Schindler u...@thetaphi.de wrote:
Scores are only compatible if the query is the same, which is not the case
for you.
So you cannot merge hits from different queries.
So I guess in the case where the different query parsers happen to
generate the same
On Mon, Feb 20, 2012 at 12:07 PM, Uwe Schindler u...@thetaphi.de wrote:
See my response. The problem is not in Lucene; its in general a problem of
fixed
thread pools that execute other callables from within a callable running at
the
moment in the same thread pool. Callables are simply
On Thu, Mar 1, 2012 at 6:20 PM, Sudarshan Gaikaiwari sudars...@acm.org wrote:
Hi
https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/document/DocValuesField.html
The documentation at the above link indicates that the optimal way to
add a DocValues field is to create it
On Fri, Mar 2, 2012 at 6:22 PM, su ha s_han...@yahoo.com wrote:
Hi,
I'm new to Lucene. I'm indexed some documents with Lucene and need to
sanitize it to ensure
that they do not have any social security numbers (3-digits 2-digits
4-digits).
(How) Can I write a query (with the QueryParser)
On Wed, Apr 18, 2012 at 9:27 AM, Vladimir Gubarkov xon...@gmail.com wrote:
Hi, dear Lucene specialists,
The only explanation I could think of is the new TieredMergePolicy
instead of old LogMergePolicy. Could it be that because of
TieredMergePolicy merges not adjacent segments - this results
On Fri, May 11, 2012 at 9:56 PM, Jong Kim jong.luc...@gmail.com wrote:
2. If Lucene can recycle old IDs, it would be even better if I could force
it to re-use a particular doc ID when updating a document by deleting old
one and creating new one. This scheme will allow me to reference this doc
On Thu, May 17, 2012 at 7:11 AM, Chris Harris rygu...@gmail.com wrote:
but also crazier ones, perhaps like
agreement w/5 (medical and companion)
(dog or dragon) w/5 (cat and cow)
(daisy and (dog or dragon)) w/25 (cat not cow)
[skip]
Everything in your post matches our experience. We ended up
On Fri, May 18, 2012 at 6:23 AM, Jamie Johnson jej2...@gmail.com wrote:
I think you want to have a look at the QueryParser classes. Not sure
which you're using to start with but probably the default QueryParser
should suffice.
There are (at least) two catches though:
1. The semantics of a
On Sat, May 26, 2012 at 12:07 PM, Chris Harris rygu...@gmail.com wrote:
Alternatively, if you insist that query
merger w/5 (medical and agreement)
should match document medical x x x merger x x x agreement
then you can propagate 2x the parent's slop value down to child queries.
This is in
On Fri, Jun 8, 2012 at 5:35 AM, Jack Krupansky j...@basetechnology.com wrote:
Well, if you have defined OR/or and IN/in as stopwords, what is it you expect
other than for the analyzer to ignore those terms (which with a boolean “AND”
means match nothing)?
Is this behaviour really logical?
On Mon, Jul 23, 2012 at 10:16 PM, Deepak Shakya just...@gmail.com wrote:
Hey Jack,
Can you let me know how should I do that? I am using the Lucene 3.6 version
and I dont see any parse() method for StandardAnalyzer.
In your case, presumably at indexing time you should be using a
On Thu, Jul 26, 2012 at 5:38 AM, Simon Willnauer
simon.willna...@gmail.com wrote:
you really shouldn't do that! If you use lucene as a Primary key
generator why don't you build your own on top. Just add one layer that
accepts the document and returns the PID and internally put it in an
ID
On Thu, Aug 16, 2012 at 11:27 AM, zhoucheng2008 zhoucheng2...@gmail.com wrote:
+(title:21 title:a title:day title:once title:a title:month)
Looks like you have a fairly big boolean query going on here, and some
of the terms you're using are really common ones like a.
Are you using AND or OR
On Fri, Sep 7, 2012 at 6:12 PM, Jochen Hebbrecht
jochenhebbre...@gmail.com wrote:
Hi qibaoyuan,
I tried your second solution, using the scoring data. I think in this way,
I could use MoreLikeThis. All documents with a score X are a possible
match :-).
FWIW, there is also
On Thu, Sep 20, 2012 at 4:28 AM, vempap phani.vemp...@emc.com wrote:
Hello All,
I've a issue with respect to the distance measure of SpanNearQuery in
Lucene. Let's say I've following two documents:
DocID: 6, cotent:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001
1002 1003 1004 1005
On Sat, Oct 27, 2012 at 1:53 PM, Tom fivemile...@gmail.com wrote:
Hello,
using Lucene 4.0.0b, I am trying to get a superset of all stop words (for
an international app).
I have looked around, and not found anything specific. Is this the way to go?
CharArraySet internationalSet = new
On Mon, Nov 5, 2012 at 4:25 AM, Michael-O 1983-01...@gmx.net wrote:
Continuing my answer from above. Have you ever worked with the Spring
Framework? They apply a very nice exception translation pattern. All
internal exceptions are turned to specialized unchecked exceptions like
On Wed, Nov 7, 2012 at 10:11 PM, Ian Lea ian@gmail.com wrote:
4.0 has maybeRefreshBlocking which is useful if you want to guarantee
that the next call to acquire() will return a refreshed instance.
You don't say what version you're using.
If you're stuck on 3.6.1 can you do something with
On Thu, Nov 8, 2012 at 8:29 AM, Trejkaz trej...@trypticon.org wrote:
It's not only protected... but the class is final as well (the method
might as well be private so that it doesn't give a false sense of hope
that it can be overridden.)
I might have to clone the whole class just to make
I have a feature I wanted to implement which required a quick way to
check whether an individual document matched a query or not.
IndexSearcher.explain seemed to be a good fit for this.
The query I tested was just a BooleanQuery with two TermQuery inside
it, both with MUST. I ran an empty query
On Wed, Nov 21, 2012 at 12:33 AM, Ramprakash Ramamoorthy
youngestachie...@gmail.com wrote:
On Tue, Nov 20, 2012 at 5:42 PM, Danil ŢORIN torin...@gmail.com wrote:
Ironically most of the changes are in unicode handling and standard
analyzer ;)
Ouch! It hurts then ;)
What we did going from 2
On Wed, Nov 21, 2012 at 10:40 AM, Robert Muir rcm...@gmail.com wrote:
Explain is not performant... but the comment is fair I think? Its more of a
worst-case, depends on the query.
Explain is going to rewrite the query/create the weight and so on just to
advance() the scorer to that single doc
I recently implemented the ability for multiple users to open the
index in the same process (whoa, you might think, but this has been
a single user application forever and we're only just making the
platform capable of supporting more than that.)
I found that filters are being stored twice and
On Tue, Nov 27, 2012 at 9:31 AM, Robert Muir rcm...@gmail.com wrote:
On Thu, Nov 22, 2012 at 11:10 PM, Trejkaz trej...@trypticon.org wrote:
As for actually doing the invalidation, CachingWrapperFilter itself
doesn't appear to have any mechanism for invalidation at all, so I
imagine I
On Wed, Nov 28, 2012 at 2:09 AM, Robert Muir rcm...@gmail.com wrote:
I don't understand how a filter could become invalid even though the reader
has not changed.
I did state two ways in my last email, but just to re-iterate:
(1): The filter reflects a query constructed from lines in a text
Hi all.
trying to figure out what I was doing wrong in some of my own code so
I looked to LowerCaseFilter since I thought I remembered it doing this
correctly, and lo and behold, it failed the same test I had written.
Is this a bug or an intentional difference in behaviour?
@Test
public
On Fri, Nov 30, 2012 at 8:22 PM, Ian Lea ian@gmail.com wrote:
Sounds like a side effect of possibly different, locale-dependent,
results of using String.toLowerCase() and/or Character.toLowerCase().
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toLowerCase()
specifically
On Tue, Dec 4, 2012 at 10:09 AM, Vitaly Funstein vfunst...@gmail.com wrote:
If you don't need to support case-sensitive search in your application,
then you may be able to get away with adding string fields to your
documents twice - lowercase version for indexing only, and verbatim to
store.
On Tue, Dec 4, 2012 at 8:33 PM, BIAGINI Nathan
nathan.biag...@altanis.fr wrote:
I need to send a class containing Lucene elements such as `Query` over the
network using EJB and of course this class need to be serialized. I marked
my class as `Serializable` but it does not seems to be enough:
On Sat, Jan 5, 2013 at 4:06 AM, Klaus Nesbigall llg...@gmx.de wrote:
The actual behavior doesn't work either.
The english word families will not be found in case the user types the query
familie*
So why solve the problem by postulate one oppinion as right and another as
wrong?
A simple
On Wed, Jan 9, 2013 at 10:57 AM, Steve Rowe sar...@gmail.com wrote:
Trejkaz (and maybe Sai too): ICUTokenizer in Lucene's icu module may be be of
interest to you, along with the token filters in that same module. - Steve
ICUTokenizer sounds like it's implementing UAX #29, which is exactly
On Wed, Jan 9, 2013 at 5:25 PM, Steve Rowe sar...@gmail.com wrote:
Dude. Go look. It allows for per-script specialization, with (non-UAX#29)
specializations by default for Thai, Lao, Myanmar and Hewbrew. See
DefaultICUTokenizerConfig. It's filled with exactly the opposite of what you
On Tue, Jan 29, 2013 at 3:42 AM, Andrew Gilmartin
and...@andrewgilmartin.com wrote:
When I first started using Lucene, Lucene's Query classes where not suitable
for use with the Visitor pattern and so I created my own query class
equivalants and other more specialized ones. Lucene's classes
On Thu, Jan 31, 2013 at 11:05 PM, Michael McCandless
luc...@mikemccandless.com wrote:
It's confusing, but you should never try to re-index a document you
retrieved from a searcher, because certain index-time details (eg,
whether a field was tokenized) are not preserved in the stored
document.
Hi all.
We have an application which has been around for so long that it's
still using doc IDs to key to an external database.
Obviously this won't work forever (even in Lucene 3.x we had to use a
custom merge policy to keep it working) so we want to introduce
application IDs eventually. We have
On Tue, Mar 12, 2013 at 10:42 PM, Hu Jing huj@gmail.com wrote:
so my question is how to achieve a non-sort query method, this method can
get result constantly and don't travel all unnecessary doc.
Does Lucene supply some strategies to implement this?
If you want the result as soon as
On Wed, Jul 10, 2013 at 12:53 AM, Uwe Schindler u...@thetaphi.de wrote:
Hi,
there is no more locale-based sorting in Lucene 4.x. It was deprecated in 3.x,
so you should get a warning about deprecation already!
I wasn't sure about this because we are on 3.6 and I didn't see a
deprecation
On Wed, Jul 10, 2013 at 4:20 PM, Uwe Schindler u...@thetaphi.de wrote:
Hi,
The fast replacement (means sorting works as fast without collating) is to
index the fields
used for sorting with CollationKeyAnalyzer ([snip]). The Collator you get
from e.g. the locale.
[snip]
The better was is,
Hi all.
Is there some kind of callback where we can be notified about commits?
Sometimes a call to commit() doesn't actually commit anything (e.g. if
there is nothing in memory at the time.) I'm not really sure what's
wrong with assuming it does commit something, because it's another
developer
On Mon, Sep 2, 2013 at 4:10 PM, Ankit Murarka
ankit.mura...@rancoretech.com wrote:
There's a reason why Writer is being opened everytime inside a while loop. I
usually open writer in main method itself as suggested by you and pass a
reference to it. However what I have observed is that if my
The current ordering of JapaneseAnalyser's token filters is as follows:
1. JapaneseBaseFormFilter
2. JapanesePartOfSpeechStopFilter
3. CJKWidthFilter (similar to NormaliseFilter)
4. StopFilter
5. JapaneseKatakanaStemFilter
6. LowerCaseFilter
Our existing support for
In 3.6.2, I notice MultiFieldAttribute is deprecated. So I looked to
the docs to find the replacement:
https://lucene.apache.org/core/3_6_2/api/contrib-queryparser/org/apache/lucene/queryParser/standard/config/MultiFieldAttribute.html
...and the Deprecated note doesn't say what we're supposed to
On Sat, Jan 25, 2014 at 4:29 AM, Olivier Binda olivier.bi...@wanadoo.fr wrote:
I would like to serialize a query into a string (A) and then to unserialize
it back into a query (B)
I guess that a solution is
A) query.toString()
B) StandardQueryParser().parse(query,)
If your custom query
On Mon, Jan 27, 2014 at 3:48 AM, Andreas Brandl m...@3.141592654.de wrote:
Is there some limitation on the length of fields? How do I get around this?
[cut]
My overall goal is to index (arbitrary sized) text files and run a regular
expression search using lucene's RegexpQuery. I suspect the
Hi all.
I'm trying to find a precise and reasonably efficient way to highlight
all occurrences of terms in the query, only highlighting fields which
match the corresponding fields used in the query. This seems like it
would be a fairly common requirement in applications. We have an
existing
On Wed, Feb 5, 2014 at 4:16 AM, Earl Hood e...@earlhood.com wrote:
Our current solution is to do highlighting on the client-side. When
search happens, the search results from the server includes the parsed
query terms so the client has an idea of which terms to highlight vs
trying to
On Thu, Feb 20, 2014 at 1:43 PM, Jamie Johnson jej2...@gmail.com wrote:
Is there a way to limit the fields a user can query by when using the
standard query parser or a way to get all fields/terms that make up a query
without writing custom code for each query subclass?
If you mean
On Tue, Mar 4, 2014 at 4:44 AM, Jack Krupansky j...@basetechnology.com wrote:
What is the hex value for that second character returned that appears to
display as an apostrophe? Hex 92 (decimal 146) is listed as Private Use
2, so who knows what it might display as.
Well, if they're dealing
On Mon, Jun 9, 2014 at 7:57 PM, Jamie ja...@mailarchiva.com wrote:
Greetings
Our app currently uses language specific analysers (e.g. EnglishAnalyzer,
GermanAnalyzer, etc.). We need an option to disable stemming. What's the
recommended way to do this? These analyzers do not include an option
Hi all.
The inability to read people's existing indexes is essentially the
only thing stopping us upgrading to v4, so we're stuck indefinitely on
v3.6 until we find a way around this issue.
As I understand it, Lucene 4 added the notion of codecs which can
precisely choose how to read and write
On Mon, Jun 9, 2014 at 10:17 PM, Adrien Grand jpou...@gmail.com wrote:
Hi,
It is not possible to read 2.x indices from Lucene 4, even with a
custom codec. For instance, Lucene 4 needs to hook into
SegmentInfos.read to detect old 3.x indices and force the use of the
Lucene3x codec since these
Someone asked if it was possible to do a SpanNearQuery between a
TermQuery and a MultiPhraseQuery.
Sadly, you can only use SpanNearQuery with other instances of
SpanQuery, so we have a gigantic method where we rewrite as many
queries as possible to SpanQuery. For instance, TermQuery can
trivially
Unrelated to my previous mail to the list, but related to the same
investigation...
The following test program just indexes a phrase of nonsense words
using and then queries for one of the words using the same analyser.
The same analyser is being used both for indexing and for querying,
yet in
Also in case it makes a difference, we're using Lucene v3.6.2.
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
On Tue, Aug 19, 2014 at 5:27 PM, Uwe Schindler u...@thetaphi.de wrote:
Hi,
You forgot to close (or commit) IndexWriter before opening the reader.
Huh? The code I posted is closing it:
try (IndexWriter writer = new IndexWriter(directory,
new IndexWriterConfig(Version.LUCENE_36,
Lucene 4.9 gives much the same result.
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.ja.JapaneseAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
It seems like nobody knows the answer, so I'm just going to file a bug.
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Bit of thread necromancy here, but I figured it was relevant because
we get exactly the same error.
On Thu, Jan 19, 2012 at 12:47 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Hmm, are you certain your RAM buffer is 3 MB?
Is it possible you are indexing an absurdly enormous
On Wed, Nov 26, 2014 at 2:09 PM, Erick Erickson erickerick...@gmail.com wrote:
Well
2 seriously consider the utility of indexing a 100+M file. Assuming
it's mostly text, lots and lots and lots of queries will match it, and
it'll score pretty low due to length normalization. And you probably
On Sun, Feb 8, 2015 at 9:04 PM, Uwe Schindler u...@thetaphi.de wrote:
Hi,
Lucene does not use algebraic / boolean logic! Maybe review this blog
post: https://lucidworks.com/blog/why-not-and-or-and-not/
This article is an old classic.
The plus, minus, nothing operators aren't without their
Hi all.
The Lucene 4 migration guide helpfully suggests to work with
BytesRef directly rather than converting to string, but I disagree.
Take the following example of building up a ListTerm by iterating a
TermsEnum. I think it is written in a fairly straight-forward fashion.
I added some println
On Wed, May 20, 2015 at 5:12 PM, András Péteri
apet...@b2international.com wrote:
As Olivier wrote, multiple BytesRef instances can share the underlying byte
array when representing slices of existing data, for performance reasons.
BytesRef#clone()'s javadoc comment says that the result will
On Thu, May 21, 2015 at 9:44 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:
If you really feel strongly about this, and want to advocate for more
consistency arround the meaning/implementation of clone() in Java APIs,
i suggest you take it up with the Open JDK project, and focus on a more
Hi all.
We had been going for the longest time abusing Lucene's doc IDs as our
own IDs and of course all our filters still work like this. But at the
moment, we're looking at ways to break our dependencies on this.
One of the motivators for this is the outright removal of FieldCache
in Lucene 5.
Hi all.
I know with older Lucene there was a recommendation never to use
Version.CURRENT because it would break backwards compatibility.
So we changed all our code over to call, for instance, new
StandardTokenizer(Version.LUCENE_36, createReader()).
Now StandardTokenizer(Version, Reader) is
On Sat, May 30, 2015 at 9:33 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:
My best understanding based on what I see in the current code, is that if
you care about backcompat:
* you must call setVersion() on any *Analyzer* instances you construct
before using them
* you must *not*
1 - 100 of 217 matches
Mail list logo