Hi Cedric,
On 11/08/2007, Cedric Ho wrote:
a sentence containing characters ABC, it may be segmented into AB, C or A, BC.
[snip]
In this cases we would like to index both segmentation into the index:
AB offset (0,1) position 0A offset (0,0) position 0
C offset (2,2) position 1
Hi anjana m,
You're going to have lots of trouble getting a response, for two reasons:
1. You are replying to an existing thread and changing the subject. Don't do
that. When you have a question, start a new thread by creating a new email
instead of replying.
2. You are not telling the list
Hi Rakesh,
Set the default QueryParser operator to AND (default default operator :) is OR):
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/queryParser/QueryParser.html#setDefaultOperator(org.apache.lucene.queryParser.QueryParser.Operator)
Steve
On 12/18/2007 at 1:22 PM, Rakesh Shete
that D is not anymore sufficient to qualify a doc.
Hope this helps (otherwise let this reply be forever
disqualified : - ) )
Doron
On Dec 18, 2007 9:28 PM, Steven A Rowe [EMAIL PROTECTED] wrote:
Hi Rakesh,
This doesn't look like a user-generated query. Have you
considered
Hi James,
Over the last two months, it has averaged roughly 15 messages per day. Feels
like more than semi-active to me.
Steve
On 12/19/2007 at 2:00 PM, Hartrich, James CTR USTRANSCOM J6 wrote:
Is this at least a semi-active list?
James
Hi Sumit,
Here's a good place to start:
http://lucene.apache.org/java/docs/scoring.html
Steve
On 12/28/2007 at 12:30 PM, sumittyagi wrote:
also
what is the lucene ranking (scoring documents) formula
sumittyagi wrote:
hi which file can i edit to change the scoring factors in
Hi,
It's in the global maven repo at:
http://repo1.maven.org/maven2/org/apache/lucene/
The 2.2.0 core jar is at:
http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/2.2.0/
Steve
On 01/03/2008 at 11:26 AM, tgospodinov wrote:
I couldn't find the url to the lucene maven repo if
Hi Ted,
On 01/03/2008 at 3:35 PM, Ted Chen wrote:
I'd like to make sure that my search engine can take into
account of some non-content based factors.
[snip]
P.S. My last email didn't get any response.
Au contraire, mon frère:
Hi Ariel,
On 01/09/2008 at 8:50 AM, Ariel wrote:
Dou you know others distributed architecture application that
uses lucene to index big amounts of documents ?
Apache Solr is an open source enterprise search server based on the Lucene Java
search library, with XML/HTTP and JSON APIs, hit
Hi Sanjay,
On 01/09/2008 at 3:02 PM, Sanjay Dahiya wrote:
lucene-similarity (2.1.0 and 2.2.0) jar files available on maven mirrors
don't contain any files.
That's because the o.a.l.search.similar package (the sole contents of the
contrib/similarity/ directory) has been empty as of the 2.1.0
Hi Shai,
On 01/11/2008 at 7:42 AM, Shai Erera wrote:
Will IndexReader.maxDocs() - IndexReader.numDocs() give the
correct result? or this is just a heuristic?
I think your expression gives the correct result - the abstract
IndexReader.numDocs() method is implemented in SegmentReader as:
Hi Sergey,
On 01/15/2008 at 9:57 AM, Sergey Kabashnyuk wrote:
Hi all.
I try to build mavan artifacts using from tags/lucene_2_2_0.
By calling ant generate-maven-artifacts
But BUILD FAILED
/java/src/lucene/svn/java/tags/lucene_2_2_0/build.xml:366: The following
error occurred while
Hi Itamar,
In another thread, you wrote:
Yesterday I sent an email to this group querying about some
very important (to me...) features of Lucene. I'm giving it
another chance before it goes unnoticed or forgotten. If it
was too long please let me know and I will email a shorter
list of
On 01/22/2008 at 8:49 PM, Grant Ingersoll wrote:
On Jan 22, 2008, at 6:06 PM, Steven A Rowe wrote:
On 01/21/2008 at 2:59 PM, Itamar Syn-Hershko wrote:
2) How would I set the boosts for the headers and footnotes?
I'd rather have it stored within the index file than have to
append
Hi Itamar,
On 01/24/2008 at 2:55 PM, Itamar Syn-Hershko wrote:
Lucene does not store proximity relations between data in different
fields, only within individual fields
So are 2 calls for doc-add with the same field but different
texts are considered as 1 field (latter call being
On 01/29/2008 at 10:05 AM, Grant Ingersoll wrote:
On Jan 29, 2008, at 9:29 AM, christophe blin wrote:
thanks for the pointer to the ellision filter, but I am currently stuck
with lucene-core-2.2.0 found in maven2 central repository (do not
contain this class). I'll watch for an upgrade to
Hi Chris,
Looks like the ElisionFilter handles the French problems you mentioned:
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/analysis/fr/ElisionFilter.html
See the code for the list of /X'/ constructions it handles:
Hi GokulAnand,
On 02/05/2008 at 12:33 AM, GokulAnand wrote:
Can some one get me the link to get lucene 2.3 jars.
It is considered bad form on this list to reply to an existing thread with a
message on a different topic than the one already being discussed - this is
called thread hijacking.
Hi Erica,
Another good place to look is at the FAQ:
http://wiki.apache.org/lucene-java/LuceneFAQ
Steve
On 02/08/2008 at 8:10 AM, Grant Ingersoll wrote:
http://wiki.apache.org/lucene-java/MailingListArchives has a variety
of options (although the readlist one is not listed)
On Feb 8,
Hi Cesar,
On 02/11/2008 at 2:19 PM, Cesar Ronchese wrote:
I'm running problems with document deletion.
[...]
This simply doesn't delete anything from the Index.
//see the code sample:
//theFieldName was previously stored as Field.Store.YES and
Field.Index.TOKENIZED.
Term t = new
Hi Cooper Geng,
Ferret is a Lucene-inspired Ruby search engine for Ruby - maybe that would be
useful for you?:
http://ferret.davebalmain.com/trac
Steve
On 02/19/2008 at 2:25 AM, coolgeng coolgeng wrote:
Hi guys,
Now an idea knock my brain, which I want to integrate the
lucene into my
\analyzers\src\java\org\apache\lucene\analysis\ngram ??
Does this tokenizer do what I need?
thank you,
-Ghinwa
On Tue, 19 Feb 2008, Steven A Rowe wrote:
Mark,
The ShingleFilter contrib has not been committed yet - it's still here:
https://issues.apache.org/jira/browse/LUCENE-400
Mark,
The ShingleFilter contrib has not been committed yet - it's still here:
https://issues.apache.org/jira/browse/LUCENE-400
Steve
On 02/19/2008 at 2:33 AM, markharw00d wrote:
Further to Grant's useful background - there is an analyzer specifically
for multi-word terms in contrib. See
Hi C.B.,
Yonik is referring to a Solr class:
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/WordDelimiterFilter.java?view=markup
You should theoretically be able to use this filter with straight Lucene code,
as long as it's on the classpath.
(I'm guessing
Hi,
On 02/20/2008 at 12:29 PM, sumittyagi wrote:
hi i want to rerank the documents obatined from the HITS, how can i edit
the scoring formula.
Here's a good place to start:
http://lucene.apache.org/java/docs/scoring.html
Steve
Hi Stanley,
I modernized the files in LUCENE-400 a bit - you can see the details in
comments I made on the issue. The results, including all files needed to
address the issue, are in the file attached to the issue named
LUCENE-400.patch.
I can tell you aren't using the modernized version
Hi Eran, see my comments below inline:
On 03/11/2008 at 9:23 AM, Eran Sevi wrote:
I would like to ask for suggestions of the best design for
the following scenario:
I have a very large number of XML files (around 1M).
Each file contains several sections. Each section contains
many elements
Hatcher's and Otis Gospodnetic's excellent book Lucene in
Action covers sorting:
http://www.manning.com/hatcher2/
Steve
On Tue, Mar 11, 2008 at 5:48 PM, Steven A Rowe [EMAIL PROTECTED] wrote:
Hi Eran, see my comments below inline:
On 03/11/2008 at 9:23 AM, Eran Sevi wrote:
I would like
On 03/11/2008 at 11:48 AM, Steven A Rowe wrote:
5 billion docs is within the range that Lucene can handle. I
think you should try doc = element and see how well it works.
Sorry, Eran, I was dead wrong about this assertion. See this thread for more
information:
http://www.nabble.com
Hi Darren,
Check out SpanFirstQuery and SpanRegexQuery:
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/spans/SpanFirstQuery.html
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/regex/SpanRegexQuery.html
Steve
On 03/16/2008 at 8:55 PM, Darren Govoni wrote:
Hi Bruce,
On 04/02/2008 at 4:58 PM, [EMAIL PROTECTED] wrote:
I am having a problem when searching for certain Unicode
characters, such as the Registered Trademark. That's the
Unicode character 00AE. It's also a problem searching for a
Japanese Yen symbol (Unicode character 00A5).
I'm using
Hi Prashant,
On 04/22/2008 at 2:23 PM, Prashant Malik wrote:
We have been observing the following problem while
tokenizing using lucene's StandardAnalyzer. Tokens that we get is
different on different machines. I am suspecting it has something to do
with the Locale settings on individual
the
same nfs mounts on both the machines
Also we have tried with lucene2.2.0 and 2.3.1. with the same result .
also about the actual string u have it right till 2 .
3,4,5 are a single character
Thx
PM
On Tue, Apr 22, 2008 at 12:01 PM, Steven A Rowe
[EMAIL PROTECTED] wrote
Hi Esra,
Caveat: I don't speak, read, write, or dream in Farsi - I just know that it
mostly shares its orthography with Arabic, and that they are both written and
read right-to-left.
How are you constructing the queries? Using QueryParser? If so, then I
suspect the problem is that you
On 04/30/2008 at 12:50 PM, Steven A Rowe wrote:
Caveat: I don't speak, read, write, or dream in Farsi - I
just know that it mostly shares its orthography with Arabic,
and that they are both written and read right-to-left.
How are you constructing the queries? Using QueryParser? If
so
Hi Esra,
Going back to the original problem statement, I see something that looks
illogical to me - please correct me if I'm wrong:
On Apr 30, 2008, at 3:21 AM, esra wrote:
i am using lucene's IndexSearcher to search the given xml by
keyword which contains farsi information.
while searching
/2008 at 9:31 AM, esra wrote:
Hi Steven,
sorry i made a mistake. unicodes are like this:
د=U+62F
ژ = U+632
and the first letter of ساب ووفر is س = U+633
you can also check them here
http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html
Esra
Steven A Rowe wrote
-hannover.de/nhtcapri/persian-alphabet.html
Esra
Steven A Rowe wrote:
Hi Esra,
Going back to the original problem statement, I see something that
looks illogical to me - please correct me if I'm wrong:
On Apr 30, 2008, at 3:21 AM, esra wrote:
i am using
and the searcher works
with unicodes.
Esra
Steven A Rowe wrote:
Hi Esra,
You are *still* incorrectly referring to the glyph with three dots over
it:
On 05/02/2008 at 12:18 PM, esra wrote:
yes the correct one is ژ /ze/U+632.
ژ is *not* ze/U+632 - it is zhe/U+698.
Have you
and post back about how it works.
Thanks,
Steve
On 05/03/2008 at 8:33 AM, esra wrote:
Hi Steven,
thanks for your help
Esra
Steven A Rowe wrote:
Hi Esra,
I have created an issue for this - see
https://issues.apache.org/jira/browse/LUCENE-1279.
I'll try to take a crack
are using fa for farsi and ar for arabic.
I have added a little control for the locale parameter in my
code and now i can see the correct results.
Thank you very much for ypur help.
Esra.
Steven A Rowe wrote:
Hi Esra,
I have attached a patch to LUCENE-1279 containing a new
Hi PV,
On 05/07/2008 at 2:54 AM, PV wrote:
Sorry for cross posting, but why the word 'Farsi' instead of
'Persian'? No one says Lucnce français or Español, or Deutsch - so why Farsi?
Please read the following article, I found it quite enlightening.
Hi Esra,
On 05/07/2008 at 11:49 AM, Steven A Rowe wrote:
At Chris Hostetter's suggestion, I am rewriting the patch
attached to LUCENE-1279, including the following changes:
- Merged the contents of the CollatingRangeQuery class into
RangeQuery and RangeFilter
- Switched the Locale
that
ConstantScoreRangeQuery doesn't have the clause limit restriction that
RangeQuery has (1024 max clauses, IIRC).
Steve
On 05/10/2008 at 1:22 PM, esra wrote:
Hi Steve,
i used the locale as ar and it works fine .
again thanks a lot for your help.
Esra
Steven A Rowe wrote:
Hi
Hi Bernd,
It's still not clear what you want to do. What will a search look like?
On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
I will try to explain what I mean with image stuff. An image in
xml-documents is usually an url to the location where the image is
stored. Additionally, such an
,
with the binary data encoded as Base64 or something similar, you should be able
to store and retrieve it as a String. Or maybe you could store a .jar file in
a binary field - that would probably be simplest.
Steve
Steven A Rowe wrote:
Hi Bernd,
It's still not clear what you want to do. What
Hi tsuraan,
On 06/17/2008 at 2:31 PM, tsuraan wrote:
I'm guessing the answer is no, but is there an equivalent to that for
lucene-2.2.0?
Not exactly equivalent, but: from the apidoc for the 2.3.2 version of
setTermInfosIndexDivisor(int)
Hi Gaurav,
To which mime types are you referring?
I can't think of a tool designed for this, but one thing you might try is
checking whether the input is compressed/packed, and if so first
decompressing/unpacking it, and then using the strings program (available on
Linux and Cygwin) to
Hi Dr. Fish,
You could make just a single query with the broadest query possible - e.g.
bacon AND country:united states
and then iterate over all results, dividing them into your three buckets based
on the values of the other two fields.
Steve
On 06/22/2008 at 12:29 PM, Dr. Fish wrote:
Hi Preetam,
On 07/14/2008 at 1:40 PM, Preetam Rao wrote:
Is there a query in Lucene which matches sub phrases ?
[snip]
I was redirected to Shingle filter which is a token filter
that spits out n-grams. But it does not seem to be best solution
since one does not know in advance what n in
Hi Chris,
The PhraseQuery class does no parsing; tokenization is expected to happen
before you feed anything to it. So unless you have an index-time analyzer that
outputs terms that look like aaa ddd -- that is, terms with embedded spaces
-- then attempting to use PhraseQuery or any other
Hi Erik,
I'm seeing the same problem - here's an excerpt from the headers of a bounce I
just got (note the address [EMAIL PROTECTED] in the last couple of
Received: headers):
Received: from spwiki.spsoftware.com (static61.17.14-87.vsnl.eth.net
[61.17.14.87] (may be forged))
for [EMAIL
Hi Sébastien,
Have you looked into the DisjunctionMaxQuery
http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/DisjunctionMaxQuery.html?
From that page:
A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum
Hi Scott,
I think this sounds reasonable, but why not also add LATIN_EXTENDED_B and
LATIN_EXTENDED_ADDITIONAL? AFAICT, among other things, these cover some
eastern European languages and Vietnamese, respectively.
Steve
On 07/18/2008 at 5:03 PM, Scott Smith wrote:
Hi Ronald,
Caveat - I haven't tested this, but:
With a RegexQuery
http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/regex/RegexQuery.html,
I think you can do something like (using your example):
+abc*123 -{Regex}(?!abc.*123$)
This query would include all documents that have
Hi Martin,
On 07/22/2008 at 5:48 AM, mpermar wrote:
I want to index some incoming text. In this case what I want
to do is just detect keywords in that text. Therefore I want
to discard everything that is not in the keywords set. This
sounds to me pretty much like the reverse of using stop
Hi Ryan,
I'm not sure Lucene's the right tool for this job.
I have used regular expressions and ternary search trees in the past to do
similar things.
Is the set of keywords too large for an in-memory solution like these? If not,
consider using a tool like the Perl package Regex::PreSuf
in.
Thanks.
On Jul 23, 2008, at 3:54 PM, Steven A Rowe wrote:
Hi Ryan,
I'm not sure Lucene's the right tool for this job.
I have used regular expressions and ternary search trees in the past to
do similar things.
Is the set of keywords too large for an in-memory solution like
On 07/23/2008 at 5:09 PM, Steven A Rowe wrote:
Karl Wettin's recently committed ShingleMatrixAnalyzer
Oops, ShingleMatrixAnalyzer - ShingleMatrixFilter.
Steve
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
Hi René,
Since you're constructing the filter from a WildcardQuery or a PrefixQuery,
both of which use a BooleanQuery to hold a TermQuery for each matching index
term, you'll need to increase the number of clauses a BooleanQuery is allowed
to hold, by calling static method
Hi Nico,
On 08/05/2008 at 9:44 AM, Nico Krijnen wrote:
On 5 aug 2008, at 11:11, Karsten F. wrote:
Can't you store only the relevant path in an extra lucene
field and set the maximum of query-terms to e.g. 2048 ?
@Karsten: We did think about simplifying permissions to just top-level
On 08/11/2008 at 2:14 PM, Chris Hostetter wrote:
Aravind R Yarram wrote:
can i escape built in lucene keywords like OR, AND aswell?
as of the last time i checked: no, they're baked into the grammer.
I have not tested this, but I've read somewhere on this list that enclosing OR
and AND in
Hi Jeff,
I don't know of a query parser that will allow you to acheive this.
However, if you can programmatically construct (at least a component of) your
queries, then you may want to check out Lucene's SpanQuery functionality.
In particular, using your example, if you combine a
Hi Dino,
StandardAnalyzer incorporates StandardTokenizer, StandardFilter,
LowerCaseFilter, and StopFilter. Any index you create using it will only
provide case-insensitive matching.
Steve
On 08/13/2008 at 12:15 PM, Dino Korah wrote:
Also would like to highlight the version of Lucene I am
Hi Bill,
A simpler suggestion, assuming you need to test for the existence of just one
particular field: rather than adding a field containing a list of all indexed
fields for a particular document, as Karsten suggested, you could just add a
field with a constant value when the field you want
Hi Dino,
I think you'd benefit from reading some FAQ answers, like:
Why is it important to use the same analyzer type during indexing and search?
http://wiki.apache.org/lucene-java/LuceneFAQ#head-0f374b0fe1483c90fe7d6f2c44472d10961ba63c
Also, have a look at the AnalysisParalysis wiki page for
Hi Dino,
The Lucene KeywordTokenizer is about as simple as tokenizers get - it just
outputs its entire input as a single token:
http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/analysis/KeywordTokenizer.java?revision=687357view=markup
Check out the source code for
Hola Juan,
On 08/21/2008 at 1:16 PM, Juan Pablo Morales wrote:
I have an index in Spanish and I use Snowball to stem and
analyze and it works perfectly. However, I am running into
trouble storing (not indexing, only storing) words that
have special characters.
That is, I store the special
Hi Sithu,
On 08/27/2008 at 3:13 PM, Sudarsan, Sithu D. wrote:
2. Where do we look for sample codes? Or detailed tutorials?
Lots of good stuff here:
http://wiki.apache.org/jakarta-lucene
and particularly here (books, articles, presentations, oh my!):
Hi gaz77,
Here's a good place to start:
http://wiki.apache.org/jakarta-lucene/AnalysisParalysis
Steve
On 08/28/2008 at 10:52 AM, gaz77 wrote:
Hi,
I'd appreciate if someone could explain the results I'm getting.
I've written a simple custom analyzer that applies the
NGramTokenFilter
Hi Yannis,
On 08/28/2008 at 12:12 PM, Yannis Pavlidis wrote:
I am trying to boost the freshness of some of our documents
in the index using the most efficient way (i.e. if 2 news
stories have the same score based on the content then I want
to promote the one that was created last)
[...]
.
Thanks,
Yannis.
-Original Message-
From: Steven A Rowe [mailto:[EMAIL PROTECTED]
Sent: Thu 8/28/2008 10:27 AM
To: java-user@lucene.apache.org
Subject: RE: boost freshness instead of sorting
Hi Yannis,
On 08/28/2008 at 12:12 PM, Yannis Pavlidis wrote:
I am trying to boost
Hi Raymond,
Check out SinkTokenizer/TeeTokenFilter:
http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/TeeTokenFilter.html
Look at the unit tests for usage hints:
Hi mck,
On 09/09/2008 at 12:58 PM, Mck wrote:
*ShortVersion*
is there a way to make the ShingleFilter perform exact matching via
inserting ^ $ begin/end markers?
Reading through the mailing list i see how exact matching can
be done, a la STFW to myself...
So the ShortVersion now
On 09/09/2008 at 4:38 PM, Mck wrote:
Looks to me like MultiPhraseQuery is getting in the way. Shingles
that begin at the same word are given the same position by
ShingleFilter, and Solr's FieldQParserPlugin creates a
MultiPhraseQuery when it encounters tokens in a query with the same
Hi mck,
On 09/10/2008 at 3:55 AM, Mck wrote:
probably better to change the one instance of .setPositionIncrement(0)
to .setPositionIncrement(1) - that way, MultiPhraseQuery will not be
invoked, and the standard disjunction thing should happen.
Tried this.
As you say i end up with
Hi Micah,
On 09/09/2008 at 11:57 PM, Micah Jaffe wrote:
I'm [...] curious how weights are calculated.
[...]
thoughts? pointers? best practices?
http://lucene.apache.org/java/docs/scoring.html
-
To unsubscribe, e-mail:
On 09/10/2008 at 12:02 PM, Mck wrote:
But this does not return the hits i want.
Have you tried submitting the query without quotes? (That's where the
PhraseQuery likely comes from.)
Yes. It does not work. It returns just the unigrams, again the same
behaviour as mentioned earlier.
Hi Marie,
On 09/11/2008 at 4:03 AM, Marie-Christine Plogmann wrote:
I am currently using the demo class IndexFiles to index some
corpus. I have replaced the Standard by a GermanAnalyzer.
Here, indexing works fine.
But if i specify a different stopword list that should be
used, the
Hi Daniel,
On 09/22/2008 at 12:49 AM, Daniel Noll wrote:
I have a question about Korean tokenisation. Currently there
is a rule in StandardTokenizerImpl.jflex which looks like this:
ALPHANUM = ({LETTER}|{DIGIT}|{KOREAN})+
LUCENE-1126 https://issues.apache.org/jira/browse/LUCENE-1126
Hi Edwin,
I don't know specifically what's causing the exception you're seeing, but note
that in Lucene 2.3.0+, the JavaCC-generated version of StandardTokenizer (where
your exception originates) has been replaced with a JFlex-generated version -
see
Hi Paul,
On 10/16/2008 at 12:00 PM, [EMAIL PROTECTED] wrote:
Still a newbie here, sorry:
a) I can see how to get a zip/jar of the Lucene
v.2.2.0 (http://www.urlstructure.com/apache/lucene/java/archive/)
or v.2.3.0 (http://www.urlstructure.com/apache/lucene/java/)
b) but none of
On 10/20/2008 at 12:41 PM, mil84 wrote:
doc.add(new Field(Title, hohoho, Field.Store.YES, Field.Index.TOKENIZED));
[...]
3) Searching in title - it DON'T WORK (I try to find hohoho, and nothing).
[...]
QueryParser parser = new QueryParser(title, new StandardAnalyzer());
Field names are
Hi James,
On 10/23/2008 at 8:30 AM, James liu wrote:
public class AnalyzerTest {
@Test
public void test() throws ParseException {
QueryParser parser = new MultiFieldQueryParser(new String[]{title,
body}, new StandardAnalyzer());
Query query1 = parser.parse(中文);
Hi Aashish,
On 10/24/2008 at 3:35 AM, Agrawal, Aashish (IT) wrote:
I want to use lucene for a simple search engine with regex support .
I tried using RegexQuery.. but seems I am missing something.
Is there any working exmaple on using RegexQuery ??
How about TestRegexQuery?:
Hi Aashish,
On 10/26/2008 at 11:36 PM, Agrawal, Aashish (IT) wrote:
I am searching a sample file like below -
---
agrawal fdfdf
fsdfafasf 3495549584
fsfsfs fsffsf r4e3fdere j4343
-
when I search this file with pattern -
.*4343*
.*[a-z]4343
j4343
or even search for
Hi Peter,
On 11/06/2008 at 4:25 PM, Peter Keegan wrote:
I've discovered another flaw in using this technique:
(+contents:petroleum +contents:engineer +contents:refinery)
(+boost:petroleum +boost:engineer +boost:refinery)
It's possible that the first clause will produce a matching
doc and
Hi Sergey,
On 11/20/2008 at 9:30 AM, Sergey Kabashnyuk wrote:
How can I convert java.math.BigDecimal numbers in to string
for its storing in lexicographical order
Here's a thoroughly untested idea, cribbing some from
o.a.l.document.NumberTools[1]: convert BigDecimals into strings of the
Hi Sam,
On 12/04/2008 at 8:21 PM, samd wrote:
Where can I get the Lucene source for the Snowball implementation.
I need to be able to search for words that are alphanumeric
and this does not work with the current snowballanalyzer.
Lucene-java's source is available through its revision control
Hi Thomas,
On 12/17/2008 at 11:52 AM, Thomas J. Buhr wrote:
Where can I see how IndexWriter.updateDocument works without getting
into Lucene all over again until this important issue is resolved?
Is there a sample of its usage for updating specific fields in a
given document?
The
Hi Peter,
On 01/12/2009 at 1:43 PM, peter.aisher wrote:
... the contents of the FILE field is the definition. the problem
is that the contents of this field is just garbled text. is there
any obvious compression technique which might have been used to
store this? The text in the files
. Carece de significación precisa. Amatar.
Asustar. Avenar.
a-2.
( Del gr. ἀ-, priv. ).
1. pref. Denota privación o negación. Acromático.
Ateísmo. Ante vocal
toma la forma an-. Anestesia. Anorexia.
Steven A Rowe wrote:
Hi Peter,
On 01/12/2009 at 1:43 PM, peter.aisher
Hi Nitin,
Lucene in Action 2nd edition http://www.manning.com/hatcher3/ is a
good place to start.
If you want free stuff, check out the Lucene wiki Resources page:
http://wiki.apache.org/lucene-java/Resources. Also, some basic code
on the wiki: http://wiki.apache.org/lucene-java/TheBasics.
Hi Dragon Fly,
You could split the original document into multiple Lucene Documents,
one for each array index, all sharing the same DocID field value.
Then your queries just work. But you'd have to do result
consolidation, removing duplicate original docs when you get matches at
multiple array
On 2/24/2009 at 5:36 PM, Chris Hostetter wrote:
Shingling is (lucene specific?) vernacular for word based ngrams
Shingle is not a Lucene-specific term - here's an entry, e.g., from an
IBM Glossary of terms for enterprise search at
Hi Raymond,
On 3/1/2009, Raymond Balmès wrote:
I'm trying to index ( search later) documents that contain tri-grams
however they have the following form:
string 2 digit 2 digit
Does the ShingleFilter work with numbers in the match ?
Yes, though it is the tokenizer and previous filters in
Hi Raymond,
On 3/2/2009 at 10:09 AM, Raymond Balmès wrote:
suppose I have a tri-gram, what I want to do is index the tri-gram
string digit1 digit2 as one indexing phrase, and not index each token
separately.
As long as you don't want any transformation performed on the phrase or its
On 3/2/2009 at 4:22 PM, Grant Ingersoll wrote:
On Mar 2, 2009, at 2:47 PM, Ken Williams wrote:
Also, while perusing the threads you refer to below, I saw a
reference to the following link, which seems to have gone dead:
https://issues.apache.org/bugzilla/show_bug.cgi?id=31841
Hmm,
Hi Raymond,
On 3/3/2009 at 12:04 PM, Raymond Balmès wrote:
The range query only works on fields (using a string compare)... is
there any reason why it is not possible on the words of the document.
The following query [stringa TO stringb] would just give the list of
documents which contains
Hi Raymond,
On 3/3/2009 at 1:19 PM, Raymond Balmès wrote:
On Tue, Mar 3, 2009 at 7:18 PM, Raymond Balmès
raymond.bal...@gmail.comwrote:
Just a simplified view of my problem :
A document contains the terms index01 blabla index02 xxx yyy index03
... index10. I have the terms indexed in
1 - 100 of 224 matches
Mail list logo