Hi Phani,
Assuming you're using Lucene 3.6.X, see:
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/lucene/core/src/java/org/apache/lucene/analysis/standard/READ_BEFORE_REGENERATING.txt
and
Hi Martin,
SnowballAnalyzer was deprecated in Lucene 3.0.3 and will be removed in Lucene
5.0.
Looks like you're using Lucene 3.X; here's an (untested) Analyzer, based on
Lucene 3.6 EnglishAnalyzer, (except substituting SnowballFilter for
PorterStemmer; disabling stopword holes' position
Hi Vitaly,
Info here should help you set up snapshot dependencies:
http://wiki.apache.org/lucene-java/NightlyBuilds
Steve
-Original Message-
From: Vitaly Funstein [mailto:vfunst...@gmail.com]
Sent: Saturday, July 21, 2012 9:22 PM
To: java-user@lucene.apache.org
Subject: Re:
Nabble silently drops content from email sent through their interface on a
regular basis. I've told them about it multiple times. My suggestion: find
another way to post to this mailing list.
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent:
I added the following to both TestStandardAnalyzer and TestClassicAnalyzer in
branches/lucene_solr_3_6/, and it passed in both cases:
public void testWhitespaceHyphenWhitespace() throws Exception {
BaseTokenStreamTestCase.assertAnalyzesTo
(a, drinks - water, new String[]{drinks,
:53:29 PDT 2011 x86_64
Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz GenuineIntel GNU/Linux
On 08/05/12 11:24, Steven A Rowe wrote:
Hi Greg,
I don't see that problem - 'ant generate-maven-artifacts' just works for me.
I suspect that the XSLT processor included with your JDK does not support
If you use the Lucene/Solr Maven POMs to drive the build, I committed a major
change last night (see https://issues.apache.org/jira/browse/LUCENE-3948 for
more details):
* 'ant get-maven-poms' no longer places pom.xml files under the lucene/ and
solr/ directories. Instead, they are placed in
] at org.apache.tools.ant.Main.startAnt(Main.java:217)
[copy] at
org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
[copy] at
org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
On 08/05/12 10:31, Steven A Rowe wrote:
If you use the Lucene/Solr Maven POMs to drive
Hi Dawn,
Can you give an example of a partial match?
Steve
-Original Message-
From: Dawn Zoë Raison [mailto:d...@digitorial.co.uk]
Sent: Friday, April 20, 2012 7:59 AM
To: java-user@lucene.apache.org
Subject: Highlighter and Shingles...
Hi,
Are there any notes on making the
Hi Vladimir,
The most uncomfortable in new behaviour to me is that in past I used
to search by subdomain like bbb.com: and have displayed results
with www.bbb.com:, aaa.bbb.com: and so on. Now I have 0
results.
About domain names, see my response to a similar question today on
Hi Hanu,
Depending on the nature of the partial word match you're looking for - do you
want to only match partial words that match at the beginning of the word? - you
should look either at NGramTokenFilter or EdgeNGramTokenFilter:
Hi okayndc,
What *do* you want?
Steve
-Original Message-
From: okayndc [mailto:bodymo...@gmail.com]
Sent: Thursday, April 05, 2012 1:34 PM
To: java-user@lucene.apache.org
Subject: HTML tags and Lucene highlighting
Hello,
I currently use Lucene version 3.0...probably need to upgrade
(in the field
configured to use HTMLStripCharFilter, anyway).
So HTMLStripCharFilter should do what you want.
Steve
From: okayndc [mailto:bodymo...@gmail.com]
Sent: Thursday, April 05, 2012 3:36 PM
To: Steven A Rowe
Cc: java-user@lucene.apache.org
Subject: Re: HTML tags and Lucene highlighting
Hi Nilesh,
Which version of Lucene are you using? StandardTokenizer behavior changed in
v3.1.
Steve
-Original Message-
From: Nilesh Vijaywargiay [mailto:nilesh.vi...@gmail.com]
Sent: Tuesday, March 27, 2012 2:04 PM
To: java-user@lucene.apache.org
Subject: Lucene tokenization
I have
Hi Ilya,
What analyzers are you using at index-time and query-time?
My guess is that you're using an analyzer that includes punctuation in the
tokens it emits, in which case your index will have things like sentence. and
sentence? in it, so querying for sentence will not match.
Luke can tell
index are UTF8.
I am using the standard analyzer for English text and other contributed
analyzers for respective foreign texts
Thanks,
Ilya
-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: Monday, March 26, 2012 10:59 AM
To: java-user@lucene.apache.org
Subject
On 3/26/2012 at 12:21 PM, Ilya Zavorin wrote:
I am not seeing anything suspicious. Here's what I see in the HEX:
n.e from pain.electricity: 6E-2E-0D-0A-0D-0A-65
(n-.-CR-LF-CR-LF-e) e.H from sentence.He: 65-2E-0D-0A-48
I agree, standard DOS/Windows line endings.
I am pretty sure I am using
IndexReader.openIfChanged in Lucene 4.0?
On Mon, Mar 5, 2012 at 11:07 AM, Steven A Rowe sar...@syr.edu wrote:
The second item in the top section in trunk CHANGES.txt (back compat policy
changes):
Could you guys put this on the web site (or a link to it)? Or try to get it to
SEO more prominently?
* LUCENE
You want the lucene-queryparser jar. From trunk MIGRATE.txt:
* LUCENE-3283: Lucene's core o.a.l.queryParser QueryParsers have been
consolidated into module/queryparser,
where other QueryParsers from the codebase will also be placed. The
following classes were moved:
-
PatternReplaceCharFilter would probably work, or maybe a custom CharFilter?
*CharFilter has the advantage of preserving original text offsets, for
highlighting.
Steve
-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com]
Sent: Monday, February 27, 2012 12:57 PM
To:
UAX29URLEmailTokenizer (see http://goo.gl/evH97). There is no Analyzer
available that uses this Tokenizer, but you can define your own one like
StandardAnalyzer, but with this class as Tokenizer (not
StandardTokenizer).
I am not sure why there is no Analyzer implementation already available,
maybe Steven Rowe
Hi Paul,
Lucene QueryParser splits on whitespace and then sends individual words
one-by-one to be analyzed. All analysis components that do their work based on
more than one word, including ShingleFilter and SynonymFilter, are borked by
this. (There is a JIRA issue open for the QueryParser
Hi Sudarshan,
I think this wiki page has the info you want:
http://wiki.apache.org/lucene-java/HowNightlyBuildsAreMade
Steve
-Original Message-
From: sudarsh...@gmail.com [mailto:sudarsh...@gmail.com] On Behalf Of
Sudarshan Gaikaiwari
Sent: Tuesday, February 14, 2012 10:01 PM
To:
Hi Damerian,
One way to handle your scenario is to hold on to the previous token, and only
emit a token after you reach at least the second token (or at end-of-stream).
Your incrementToken() method could look something like:
1. Get current attributes: input.incrementToken()
2. If previous
To: java-user@lucene.apache.org
Subject: Re: Access next token in a stream
Στις 9/2/2012 8:54 μμ, ο/η Steven A Rowe έγραψε:
Hi Damerian,
One way to handle your scenario is to hold on to the previous token, and
only emit a token after you reach at least the second token (or at end
Message-
From: Damerian [mailto:dameria...@gmail.com]
Sent: Thursday, February 09, 2012 5:00 PM
To: java-user@lucene.apache.org
Subject: Re: Access next token in a stream
Στις 9/2/2012 10:51 μμ, ο/η Steven A Rowe έγραψε:
Damerian,
The technique I mentioned would work for you
Hi Dawn,
I assume that when you refer to the impact of stop words, you're concerned
about query-time performance? You should consider the possibility that
performance without removing stop words is good enough that you won't have to
take any steps to address the issue.
That said, there are
Hi Paul,
On 10/19/2011 at 5:26 AM, Paul Taylor wrote:
On 18/10/2011 15:25, Steven A Rowe wrote:
On 10/18/2011 at 4:57 AM, Paul Taylor wrote:
On 18/10/2011 06:19, Steven A Rowe wrote:
Another option is to create a char filter that substitutes
PUNCT-EXCLAMATION for exclamation points
Hi Paul,
What version of Lucene are you using? The JFlex spec you quote below looks
pre-v3.1?
Steve
-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Wednesday, October 19, 2011 6:50 AM
To: Steven A Rowe; java-user@lucene.apache.org 'java-
u
Hi Paul,
On 10/18/2011 at 4:57 AM, Paul Taylor wrote:
On 18/10/2011 06:19, Steven A Rowe wrote:
Another option is to create a char filter that substitutes
PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods,
etc.,
Yes that is how I first did it
No, I don't think you did
Hi Paul,
You could add a rule to the StandardTokenizer JFlex grammar to handle this
case, bypassing its other rules.
Another option is to create a char filter that substitutes PUNCT-EXCLAMATION
for exclamation points, PUNCT-PERIOD for periods, etc., but only when the
entire input consists
Hi Peyman,
The API docs give a hint
http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/index/IndexWriter.html:
=
Nested Class Summary
...
static class IndexWriter.MaxFieldLength
Deprecated. use LimitTokenCountAnalyzer instead.
=
Hi sbs,
Solr's WordDelimiterFilterFactory does what you want. You can see a
description of its function here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory.
WordDelimiterFilter, the filter class implementing the above factory's
functionality, is
Hi Eric,
On 7/24/2011 at 3:07 AM, Eric Charles wrote:
0112233445566778
12345678901234567890123456789012345678901234567890123456789012345678901234567890
Jenkins jobs builds lucene trunk with 'mvn --batch-mode
--non-recursive
This slide show is a few years old, but I think it might be a good introduction
for you to the differences between the projects:
http://www.slideshare.net/dnaber/apache-lucene-searching-the-web-and-everything-else-jazoon07/
Steve
-Original Message-
From: Ing. Yusniel Hidalgo Delgado
Ant is the official Lucene/Solr build system. Snapshot and release artifacts
are produced with Ant.
While Maven is capable of producing artifacts, the artifacts produced in this
way may not be the same as the official Ant artifacts. For this reason: no,
the artifacts should not be built with
Hi Hamada,
Do you know about the Lucene demo?:
http://lucene.apache.org/java/3_2_0/demo.html
Steve
-Original Message-
From: hamadazahera [mailto:hamadazah...@gmail.com]
Sent: Saturday, June 18, 2011 9:30 AM
To: java-user@lucene.apache.org
Subject: Lucene Simple Project
Hello
Hi Ivan,
You do have rights to submit fixes to Lucene - everyone does!
Here's how: http://wiki.apache.org/lucene-java/HowToContribute
Please create a patch, create an issue in JIRA, and then attach the patch to
the JIRA issue. When you do this, you are asked to state that you grant
license
Hi WeiWei,
Thanks for the report.
Can you provide a self-contained unit test that triggers the bug?
Thanks,
Steve
-Original Message-
From: Weiwei Wang [mailto:ww.wang...@gmail.com]
Sent: Monday, May 23, 2011 1:25 AM
To: java-user@lucene.apache.org
Subject: FastVectorHighlighter
Hi Renaud,
That's normal behavior, since you have AND as default operator. This is
equivalent to placing a + in front of every element of your query. In fact,
if you removed the other two +s, you would get the same behavior. I think
you'll get what you want by just switching the default
Hi Renaud,
On 5/20/2011 at 1:58 PM, Renaud Delbru wrote:
As said in
http://lucidworks.lucidimagination.com/display/LWEUG/Boolean+Operators,
if one or more of the terms in a term list has an explicit term operator
(+ or - or relational operator) the rest of the terms will be treated as
Hi Cheng,
Lucene 3.3 does not exist - do you mean branches/branch_3x ?
FYI, as of Lucene 3.1, there is an Ant target you can use to setup an Eclipse
project for Lucene/Solr - run this from the top level directory of a full
source tree (including dev-tools/ directory) checked out from
, 2011 10:48 AM
To: java-user@lucene.apache.org
Cc: Steven A Rowe
Subject: RE: Lucene 3.3 in Eclipse
Steve, thanks for correction. You are right. The version is 3.0.3
released last Oct.
I did place an ant jar in Eclipse, and it does the job to remove some
compiling errors. However, it seems
A thought: one way to do #1 without modifying ShingleFilter: if there were a
StopFilter variant that accepted regular expressions instead of a stopword
list, you could configure it with a regex like /_ .*|.* _| _ / (assuming a full
match is required, i.e. implicit beginning and end anchors),
-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Thursday, May 12, 2011 1:15 PM
To: java-user@lucene.apache.org
Subject: Re: Can I omit ShingleFilter's filler tokens
On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe sar...@syr.edu wrote:
A thought: one way to do #1 without modifying
Hi Bill,
I can think of two possible interpretations of removing filler tokens:
1. Don't create shingles across stopwords, e.g. for text one two three four
five and stopword three, bigrams only, you'd get (one two, four five),
instead of the current (one two, two _, _ four, four five).
2.
, Steven A Rowe sar...@syr.edu wrote:
Hi Bill,
I can think of two possible interpretations of removing filler
tokens:
1. Don't create shingles across stopwords, e.g. for text one two three
four five and stopword three, bigrams only, you'd get (one two,
four five), instead of the current (one
Hi Paul,
What did you find about Luke that's buggy? Bug reports are very useful; please
contribute in this way.
The official Lucene 3.0.3 distribution jars were compiled using the -g cmdline
argument to javac - by default, though, only line number and source file
information is generated.
Hi Paul,
On 4/29/2011 at 4:14 PM, Paul Taylor wrote:
On 29/04/2011 16:03, Steven A Rowe wrote:
What did you find about Luke that's buggy? Bug reports are very
useful; please contribute in this way.
Please see previous post, in summary mistake on my part.
Okay... Which previous post? I
Thanks Dawid. – Steve
From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of Dawid
Weiss
Sent: Friday, April 29, 2011 4:45 PM
To: java-user@lucene.apache.org
Cc: Steven A Rowe
Subject: Lucene 3.0.3 with debug information
This is the e-mail you're looking for, Steven (it wasn't
Ranjit,
The problem is definitely the analyzer you are passing to QueryParser or
MultiFieldQueryParser, and not the parser itself.
The following tests succeed using KeywordAnalyzer, which is a pass-through
analyzer (the output is the same as the input):
public void testSharpQP() throws
Hi Ranjit,
I suspect the problem is not QueryParser, since the TERM definition includes
the '#' character (from
http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/src/java/org/apache/lucene/queryParser/QueryParser.jj?view=markup#l1136):
| #_TERM_START_CHAR: ( ~[ , \t, \n, \r,
Hi Ranjit,
Do you know about Luke? It will let you see what's in your index, and much
more:
http://code.google.com/p/luke/
Steve
-Original Message-
From: Ranjit Kumar [mailto:ranjit.ku...@otssolutions.com]
Sent: Tuesday, April 12, 2011 9:05 AM
To:
Hi Tanuj,
Can you be more specific?
What file did you download? (Lucene 3.1 has three downloadable packages:
-src.tar.gz, .tar.gz, and .zip.)
What did you expect to find that is not there? (Some examples would help.)
Steve
-Original Message-
From: Tanuj Jain
Hi Shambhu,
ShingleFilter will construct word n-grams:
http://lucene.apache.org/java/3_1_0/api/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html
Steve
-Original Message-
From: sham singh [mailto:shamsing...@gmail.com]
Sent: Tuesday, April 05, 2011 5:53 PM
To:
Hi Alex,
From Lucene contrib CHANGES.html
http://lucene.apache.org/java/3_1_0/changes/Contrib-Changes.html#3.1.0.changes_in_backwards_compatibility_policy:
3. LUCENE-2226: Moved contrib/snowball functionality into
contrib/analyzers. Be sure to remove any old obselete
[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[x] I/we build them from source via an SVN/Git checkout.
Hi Paul,
I saw this yesterday, but haven't tried it myself:
http://karussell.wordpress.com/2010/10/27/feeding-solr-with-its-own-logs/
The author has a project called Sogger - Solr + Logger? - that can read
various forms of logs.
Steve
-Original Message-
From: Paul Libbrecht
On 12/22/2010 at 2:38 AM, Ganesh wrote:
Any other tips targeting 64 bit?
If memory usage is an issue, you might consider using HotSpot's compressed
oops option:
http://wikis.sun.com/display/HotSpotInternals/CompressedOops
Hi Manjula,
It's not terribly clear what you're doing here - I got lost in your description
of your (two? or maybe four?) classes. Sometimes things are easier to
understand if you provide more concrete detail.
I suspect that you could benefit from reading the book Lucene in Action, 2nd
NFS[1] != NTFS[2]
[1] NFS: http://en.wikipedia.org/wiki/Network_File_System_%28protocol%29
[2] NTFS: http://en.wikipedia.org/wiki/NTFS
-Original Message-
From: Pulkit Singhal [mailto:pulkitsing...@gmail.com]
Sent: Wednesday, November 10, 2010 2:55 PM
To: java-user@lucene.apache.org
Hi Martin,
StandardTokenizer and -Analyzer have been changed, as of future version 3.1
(the next release) to support the Unicode segmentation rules in UAX#29. My
(untested) guess is that your hyphenated word will be kept as a single token if
you set the version to 3.1 or higher in the
for a
StandardAnalyzer has Version_30 as its highest value. Do you know when 3.1
is due?
-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: 24 Oct 2010 21 31
To: java-user@lucene.apache.org
Subject: RE: Use of hyphens in StandardAnalyzer
Hi Martin,
StandardTokenizer
Hi Sirish,
StandardTokenizer does not produce a token from '#', as you suspected.
Something that fits the word definition, but which won't ever be encountered
in your documents, is what you should use for the delimiter - something like
a1b2c3c2b1a .
Sentence boundary handling is clunky in
Hi Sirish,
I think I understand within sentence phrase search - you want the entire
phrase to be within a single sentence. But can you give an example of non
sentence specific phrase search? It's not clear to me how useful such
capability would be.
Steve
-Original Message-
From:
Hi Sirish,
Have you looked at SpanQuery's yet?:
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/spans/package-summary.html
See also this Lucid Imagination blog post by Mark Miller:
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/
One common technique,
This is not a defect:
http://wiki.apache.org/lucene-java/LuceneFAQ#Does_Lucene_allow_searching_and_indexing_simultaneously.3F.
-Original Message-
From: Justin [mailto:cry...@yahoo.com]
Sent: Monday, October 04, 2010 2:03 PM
To: java-user@lucene.apache.org
Subject: Updating documents
: Steven A Rowe sar...@syr.edu
To: java-user@lucene.apache.org java-user@lucene.apache.org
Sent: Mon, October 4, 2010 1:05:36 PM
Subject: RE: Updating documents with fields that aren't stored
This is not a defect:
http://wiki.apache.org/lucene-
java/LuceneFAQ
Hi Iam,
Can you say why you don't like the proposed solution?
Also, the example of the scoring you're looking for doesn't appear to be
hierarchical in nature - can you give illustrate the relationship between the
tokens in [token1, token2, token3]? Also, why do you want token1 to contribute
, 2010 at 12:52 PM, Steven A Rowe sar...@syr.edu wrote:
Hi Iam,
Can you say why you don't like the proposed solution?
Also, the example of the scoring you're looking for doesn't appear to be
hierarchical in nature - can you give illustrate the relationship between
the tokens in [token1
Hi Christoph,
There could be several things going on, but it's difficult to tell without more
information.
Since excluded terms require a non-empty set from which to remove documents at
the same boolean clause level, you could try something like title:(*:*
-Datei*) avl, or -title:Datei*
Oops, setLowercaseExpandedTerms() is an instance method, not static.
I wrote:
QueryParser has a static method setLowercaseExpandedTerms() that you can call
to turn on automatic pre-expansion query term downcasing:
Hi Justin,
[...] *:* AND -myfield:foo*.
If my document contains myfield:foobar and myfield:dog, the document
would be thrown out because of the first field. I want to keep the
document because the second field does not match.
I'm assuming that you mistakenly used the same field name above
Hi Justin,
Unfortunately the suffix requires a wildcard as well in our case. There
are a limited number of prefixes though (10ish), so perhaps we could
combine them all into one query. We'd still need some sort of
InverseWildcardQuery implementation.
use another analyzer so you don't need
Hi Justin,
an example
PerFieldAnalyzerWrapper analyzers =
new PerFieldAnalyzerWrapper(new KeywordAnalyzer());
// myfield defaults to KeywordAnalyzer
analyzers.addAnalyzer(content, new SnowballAnalyzer(luceneVersion,
English));
// analyzers affects the indexed field value
you want what Lucene already does, but that's clearly not true
Hmmm, let's pretend that contents field in my example wasn't analyzed at
index
time. The unstemmed form of terms will be indexed. But if I query with a
stemmed
form or use QueryParser with the SnowballAnalyzer, I'm not going
Hi Ethan,
You'll probably get better answers about Solr specific stuff on the
solr-u...@a.l.o list.
Check out PositionFilterFactory - it may address your issue:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
Steve
-Original Message-
From:
.
Thanks,
Sudha
On Wed, Jun 23, 2010 at 12:21 PM, Steven A Rowe sar...@syr.edu wrote:
Hi Sudha,
There is such a tokenizer, named NewStandardTokenizer, in the most
recent patch on the following JIRA issue:
https://issues.apache.org/jira/browse/LUCENE-2167
It keeps (HTTP
Hi Sudha,
There is such a tokenizer, named NewStandardTokenizer, in the most recent patch
on the following JIRA issue:
https://issues.apache.org/jira/browse/LUCENE-2167
It keeps (HTTP(S), FTP, and FILE) URLs together as single tokens, and e-mails
too, in accordance with the relevant IETF
Hi Andy,
From the API docs for IndexWriter
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/IndexWriter.html:
[D]ocuments are added with addDocument and removed
with deleteDocuments(Term) or deleteDocuments(Query).
A document can be updated with
Andy,
I think batching commits either by time or number of documents is common.
Do you know about NRT (Near Realtime Search)?:
http://wiki.apache.org/lucene-java/NearRealtimeSearch. Using
IndexWriter.getReader(), you can avoid commits altogether, as well as reducing
update-search latency.
Hi Siraj,
Lucene's MemoryIndex can be used to serve this purpose.
From
http://lucene.apache.org/java/3_0_1/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html:
[T]his class targets fulltext search of huge numbers
of queries over comparatively small transient
dilemma is, I might have upto 100,000 queries to run against it.
Do you think this route will give me results in reasonable amount of
time, i.e. in a few seconds?
thanks
-siraj
On 5/17/2010 5:21 PM, Steven A Rowe wrote:
Hi Siraj,
Lucene's MemoryIndex can be used to serve this purpose
Hi Franz,
The likely problem is that you're using an index-time analyzer that strips out
the parentheses. StandardAnalyzer, for example, does this; WhitespaceAnalyzer
does not.
Remember that hits are the result of matches between index-analyzed terms and
query-analyzed terms. Except in the
Hi Aaron,
Your false positives comments point to a mismatch between what you're
currently asking Lucene for (any document matching any one of the terms in the
query) and what you want (only fully correct matches).
You need to identify the terms of the query that MUST match and tell Lucene
Hi Rene,
On 03/17/2010 at 11:17 AM, Rene Hackl-Sommer wrote:
SpanNot fieldName=MyField
Include
!-- Gets all the matching spans within L_2 boundaries and includes
them --
SpanNot
Include
SpanNear slop=2147483647 inOrder=false
SpanTermt293/SpanTerm
SpanTermt4979/SpanTerm
/SpanNear
Hi Rene,
Why can't you use a different field for each of the Level_X's, i.e.
MyLevel1Field, MyLevel2Field, MyLevel3Field?
On 03/15/2010 at 9:59 AM, Rene Hackl-Sommer wrote:
Search in MyField: Terms T1 and T2 on Level_2 and T3,
T4, and T5 on Level_3, which should both be in the
same
Hi Rene,
Have you seen SpanNotQuery?:
http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/spans/SpanNotQuery.html
For a document that looks like:
Level_1 id=1
Level_2 id=1
Level_3 id=1T1 T2 T3/Level_3
Level_3 id=2T4 T5 T6/Level_3
Level_3 id=3T7 T8 T9/Level_3
Hi Erick,
On 03/08/2010 at 3:48 PM, Erick Erickson wrote:
Is there any convenient way to, say, find all the files associated with
patch ? I realize one can (hopefully) get this information from
JIRA, but... This is a subset of the problem of searching Subversion
comments.
I know of two
Hi Mark,
On 03/01/2010 at 3:35 PM, Mark Ferguson wrote:
I will be processing short bits of text (Tweets for example), and
need to search them to see if they certain terms.
You might consider, instead of performing reverse search, just querying all of
your locations against one document at a
Hi Max,
On 02/05/2010 at 10:18 AM, Grant Ingersoll wrote:
On Feb 3, 2010, at 8:57 PM, Max Lynch wrote:
Hi, I would like to do a search for Microsoft Windows as a span, but
not match if words before or after Microsoft Windows are upper cased.
For example, I want this to match: another
Hi Jamie,
Since phrase query terms aren't analyzed, you're getting exact matches for
terms было and время, but when you search for them individually, they are
analyzed, and it is the analyzed query terms that fail to match against the
indexed terms. Sounds to me like your index-time and
Hi Jason,
Solr's PatternReplaceFilter(ts, \\P{Alnum}+$, , false) should work, chained
after an appropriate tokenizer.
Steve
On 02/04/2010 at 12:18 PM, Jason Rutherglen wrote:
Is there an analyzer that easily strips non alpha-numeric from the end
of a token?
On 02/04/2010 at 3:24 PM, Chris Hostetter wrote:
: Since phrase query terms aren't analyzed, you're getting exact
: matches
quoted phrase passed to the QueryParser are analyzed -- but they are
analyzed as complete strings, so Analyzers that treat whitespace
special may produce differnet
Hi Dennis,
You should check out payloads (arbitrary per-index-term byte[] arrays), which
can be used to encode values which are then incorporated into documents'
scores, by overriding Similarity.scorePayload():
Hi AlexElba,
The problem is that Lucene only knows how to handle character strings, not
numbers. Lexicographically, 3 10, so you get the expected results
(nothing).
The standard thing to do is transform your numbers into strings that sort as
you want them to. E.g., you can left-pad the
Hi AlexElba,
Did you completely re-index?
If you did, then there is some other problem - can you share (more of) your
code?
Do you know about Luke? It's an essential tool for Lucene index debugging:
http://www.getopt.org/luke/
Steve
On 01/13/2010 at 8:34 PM, AlexElba wrote:
Hello,
Hi Uwe,
On 12/08/2009 at 9:40 AM, Uwe Schindler wrote:
After the move to 3.0, you can (but you must not) further update
your code to use generics, which is not really needed but will
remove all compiler warnings.
This sounds like you're telling people that although they are able to update
Hi Nathan,
On 10/20/2009 at 5:03 PM, Nathan Howard wrote:
This is sort of related to the above question, but I'm trying to update
some (now depricated) Java/Lucene code that I've become aware of once we
started using 2.4.1 (we were previously using 2.3.2):
Hits results =
something with current hit
-Yonik
http://www.lucidimagination.com
On Tue, Oct 20, 2009 at 5:27 PM, Steven A Rowe sar...@syr.edu wrote:
Hi Nathan,
On 10/20/2009 at 5:03 PM, Nathan Howard wrote:
This is sort of related to the above question, but I'm trying to
update
some (now
1 - 100 of 224 matches
Mail list logo