Askar,
why do you need to add +id:idWeCareAbout?
thanks,
dt,
www.ejinz.com
search engine news forms
- Original Message -
From: Askar Zaidi [EMAIL PROTECTED]
To: java-user@lucene.apache.org; [EMAIL PROTECTED]
Sent: Wednesday, July 25, 2007 12:39 AM
Subject: Re: Fine Tuning Lucene
Le mardi 24 juillet 2007 à 13:01 -0700, Shaw, James a écrit :
Hi, guys,
I found Analyzers for Japanese, Korean and Chinese, but not stemmers;
the Snowball stemmers only include European languages. Does stemming
not make sense for ideograph-based languages (i.e., no stemming is
needed for
Hi Andy
I think:
Field.Text(name, value);
has been replaced with:
new Field(name, value, Field.Store.YES, Field.Index.TOKENIZED);
Patrick
On 25/07/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Please reference How do I get code written for Lucene 1.4.x to work with
Lucene 2.x?
You will be unable to search for fields that do not exist which is what
you originally wanted to do, instead you can do something like:
-Establish the query that will select all non-null values
TermQuery tq1 = new TermQuery(new Term(field,value1));
TermQuery tq2 = new TermQuery(new
We were affected by the great SF outage yesterday and apparently the
indexing machine crashed without being shutdown properly.
I've taken a backup of the indexes which has the usual smattering of
write.lock segments.gen, .cfs, .fdt, .fnm and .fdx etc files and looks
to be about the right size.
Simon Wistow [EMAIL PROTECTED] wrote:
We were affected by the great SF outage yesterday and apparently the
indexing machine crashed without being shutdown properly.
Eek, sorry! We are so reliant on electricity these days
I've taken a backup of the indexes which has the usual smattering
On Wed, Jul 25, 2007 at 10:08:56AM +0100, me said:
The data appears to be there - please tell me that I'm doing something
stupid and I can recover from this.
It appears by deleting the write.lock files everything has recovered.
Is this best practice? Have I just done something so terribly
Mathieu Lecarme schrieb:
Le mardi 24 juillet 2007 à 13:01 -0700, Shaw, James a écrit :
Hi, guys,
I found Analyzers for Japanese, Korean and Chinese, but not stemmers;
the Snowball stemmers only include European languages. Does stemming
not make sense for ideograph-based languages (i.e., no
On Wed, Jul 25, 2007 at 05:19:31AM -0400, Michael McCandless said:
It's somewhat spooky that you have a write.lock present because that
means you backed up while a writer was actively writing to the index
which is a bit dangerous because if the timing is unlucky (backup does
an ls but before
Simon Wistow [EMAIL PROTECTED] wrote:
On Wed, Jul 25, 2007 at 05:19:31AM -0400, Michael McCandless said:
It's somewhat spooky that you have a write.lock present because that
means you backed up while a writer was actively writing to the index
which is a bit dangerous because if the timing
The data appears to be there - please tell me that I'm doing something
stupid and I can recover from this.
It appears by deleting the write.lock files everything has recovered.
Hmmm -- it's odd that the existence of the write.lock caused you to
lose most of your index. All that should have
On Wed, Jul 25, 2007 at 05:49:41AM -0400, Michael McCandless said:
Ahhh, OK. But do you have a segments_N file?
Yup.
Yes, this is perfect. This is the simple option I described. The
more complex option is to use a custom deletion policy which enables
you to safely do backups (even if the
Simon Wistow [EMAIL PROTECTED] wrote:
On Wed, Jul 25, 2007 at 05:49:41AM -0400, Michael McCandless said:
Ahhh, OK. But do you have a segments_N file?
Yup.
OK, though I still don't understand why the existence of write.lock
caused you to lose most of your index on creating a new writer.
Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
limited by JavaCC speed. You cannot shave much more performance out of
the grammar as it is already about as simple as it gets.
JavaCC is slow indeed. We used it for a while for Carrot2, but then (3 years
ago :) switched to
This problem has been baffling me since quite some time now and has no
perfect solution in the forum !
I have 10 documents, each with 10 fields with parameterName and
parameterValue. Now, When i search for some term and I get 5 hits, how do I
find out which paramName-Value pair matched ?
I am
Currently, we use regular expression pattern matching to get hold of which
field matched. Again a pathetic solution since we have to agree upon the
subset of the lucene search and pattern matching. We cannot use Boolean
queries etc in this case.
makkhar wrote:
This problem has been
I am sure a faster StandardAnalyzer would be greatly appreciated.
I'm increasing the priority of that task then :)
StandardAnalyzer appears widely used and horrendously slow. Even better
would be a StandardAnalyzer that could have different recognizers
enabled/disabled. For example,
On Jul 25, 2007, at 7:19 AM, Stanislaw Osinski wrote:
Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really
limited by JavaCC speed. You cannot shave much more performance
out of
the grammar as it is already about as simple as it gets.
JavaCC is slow indeed. We used it for
Hello!
I am working with Tomcat. I have put the Lucene highlighter.jar in the
folder lib. And I have created an extra css, where I say that the background
color has to be yellow. The searchword has to be highlighted know.
I have got a dataTable in which the result of the following Lucene method
Hey Guys,
I need to know how I can use the HitCollector class ? I am using Hits and
looping over all the possible document hits (turns out its 92 times I am
looping; for 300 searches, its 300*92 !!). Can I avoid this using
HitCollector ? I can't seem to understand how its used.
thanks a lot,
Hi Askar,
I suggest we take a step back, and ask the question, what are you
trying to accomplish? That is, what is your application trying to
do? Forget the code, etc. just explain what you want the end result
to be and we can work from there. Based on what you have described,
I am
Hi Grant,
Thanks for the response. Heres what I am trying to accomplish:
1. Iterate over itemID (unique) in the database using one SQL query.
2. For every itemID found, run 4 searches on Lucene Index.
3. doTagSearch(itemID) ; collect score
4. doTitleSearch(itemID...) ; collect score
5.
what if I do not know all possible values of that field which is a
typical case in a free text search?
daniel rosher wrote:
You will be unable to search for fields that do not exist which is what
you originally wanted to do, instead you can do something like:
-Establish the query that will
So, you really want a single Lucene score (based on the scores of
your 4 fields) for every itemID, correct? And this score consists of
scoring the title, tag, summary and body against some keywords correct?
Here's what I would do:
while (rs.next())
{
doc = getDocument(itemId); // Get
Instead of refactoring the code, would there be a way to just modify the
query in each search routine ?
Such as, search contents:text and item:itemID; This means it would
just collect the score of that one document whose itemID field = itemID
passed from while(rs.next()).
I just need to collect
Yes, you can do that.
On Jul 25, 2007, at 12:31 PM, Askar Zaidi wrote:
Heres what I mean:
http://lucene.apache.org/java/docs/queryparsersyntax.html#Fields
title:The Right Way AND text:go
Although, I am not searching for the title the right way , I am
looking
for the score by specifying
Heres what I mean:
http://lucene.apache.org/java/docs/queryparsersyntax.html#Fields
title:The Right Way AND text:go
Although, I am not searching for the title the right way , I am looking
for the score by specifying a unique field (itemID).
when I do System.out.println(query);
I get:
In this case you should look at the source for RangeFilter.java.
Using this you could create your own filter using TermEnum and TermDocs
to find all documents that had some value for the field.
You would then flip this filter (perhaps write a FlipFilter.java, that
takes an existing filter in
Hey guys,
One last question and I think I'll have an optimized algorithm.
How can I build a query in my program ?
This is what I am doing:
QueryParser queryParser = new QueryParser(contents, new
StandardAnalyzer());
queryParser.setDefaultOperator(QueryParser.Operator.AND);
Query q =
Andy, Patrick,
Thank you. I replaced Field.Text with new Field(name, value,
Field.Store.YES, Field.Index.TOKENIZED); and it works just fine.
Cheers,
Lindsey
Patrick Kimber [EMAIL PROTECTED] wrote:
Hi Andy
I think:
Field.Text(name, value);
has been replaced with:
On Jul 25, 2007, at 1:26 PM, Askar Zaidi wrote:
Hey guys,
One last question and I think I'll have an optimized algorithm.
How can I build a query in my program ?
This is what I am doing:
QueryParser queryParser = new QueryParser(contents, new
StandardAnalyzer());
Hello,
I'm looking to extract significant terms characterizing a set of
documents (which in turn relate to a topic).
This basically comes down to functionality similar to determining the
terms with the greatest offer weight (as used for blind relevance
feedback), or maximizing tf.idf (as is
Hi all,
Apologies for the cryptic subject line, but I couldn't think of a more
descriptive one-liner to describe my problem/question to you all. Still
fairly new to Lucene here, although I'm hoping to have more of a clue once I
get a chance to read Lucene In Action.
I am implementing a search
On 7/25/07, Stanislaw Osinski [EMAIL PROTECTED] wrote:
JavaCC is slow indeed.
JavaCC is a very fast parser for a large document... the issue is
small fields and JavaCC's use of an exception for flow control at the
end of a value. As JVMs have advanced, exception-as-control-flow as
gotten
Hey,
Some common questions about Lucene.
1. does exist Ontology Wraper in Lucene implementation?
2. Does Lucene using Linear Hashing?
thnaks,
DT,
www.ejinz.com
Search news
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
On Thursday 26 July 2007 03:12:20 daniel rosher wrote:
In this case you should look at the source for RangeFilter.java.
Using this you could create your own filter using TermEnum and TermDocs
to find all documents that had some value for the field.
That's certainly the way to do it for speed.
Waht kind of Highlighter strategy Lucene is using?
thanks,
Dt
www.ejinz.com
Search Engine for News
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Is there a way to update a document in the Index without causing any change
to the order in which it comes up in searches?
thanks,
DT,
www.ejinz.com
Search everything
news, tech, movies, music
-
To unsubscribe, e-mail: [EMAIL
Hey Guys,
Thanks for all the responses. I finally got it working with some query
modification.
The idea was to pick an itemID from the database and for that itemID in the
Index, get the scores across 4 fields; add them up and ta-da !
I still have to verify my scores.
Thanks a ton, I'll be
Hi,
I am indexing a set of constantly changing documents. The change rate is
moderate (about 10 docs/sec over a 10M document collection with a 6G
total size) but I want to be right up to date (ideally within a second
but within 5 seconds is acceptable) with the index.
Right now I have code
Askar Zaidi wrote:
... Heres what I am trying to accomplish:
1. Iterate over itemID (unique) in the database using one SQL query.
2. For every itemID found, run 4 searches on Lucene Index.
3. doTagSearch(itemID) ; collect score
4. doTitleSearch(itemID...) ; collect score
5.
On Wednesday 25 July 2007 00:44, Lindsey Hess wrote:
Now, I do not need Lucene to index anything, but I'm wondering if Lucene
has query parsing classes that will allow me to transform the queries.
The Lucene QueryParser class can parse the format descriped at
Hi guys,
Is there a way of deleting a document that, because of some corruption,
got and docID larger than the maxDoc() ? I´m trying to do this but I get
this Exception:
Exception in thread main java.lang.ArrayIndexOutOfBoundsException: Array
index out of range: 106577
at
43 matches
Mail list logo