Re: better stemming engine than Porter?

2008-04-22 Thread Mathieu Lecarme
Porter stemmer is not only agressive, it is ugly, too. The generated code is too old, too few object centric and should be too slow. If your kstem compile with java 1.4, why don't you suggest it to lucene core? M. Wagner,Harry a écrit : Hi HH, Here's a note I sent Solr-dev a while back:

Re: CorruptIndexException

2008-04-22 Thread Michael McCandless
Robert Haschart [EMAIL PROTECTED] wrote: To answer your questions: I completely deleted the index each time before retesting. and the java command as shown by ps does show -Xbatch. The program is running on: uname -a Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb

Re: XSLT transform before update?

2008-04-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi , There is this new patch which implements these features. I shall update the wiki with the documentation I guess we do not need to be too worried about the memory consumption. A few MB of memory should be fine (unless your are using a file which is in 10's of MB ). Consider using

RE: better stemming engine than Porter?

2008-04-22 Thread Wagner,Harry
Thanks Ryan. I just opened SOLR-546. Please let me know if I can provide further help. Cheers! h -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, April 21, 2008 2:33 PM To: solr-user@lucene.apache.org Subject: Re: better stemming engine than Porter? Hey-

Re: More Like This boost

2008-04-22 Thread Erik Hatcher
On Apr 21, 2008, at 5:02 PM, Francisco Sanmartin wrote: Is it possible to boost the query that MoreLikeThis returns before sending it to Solr? I mean, technically is possible, because you can add a factor to the whole query but...does it make sense? (Remember that MoreLikeThis can already

Re: More Like This boost

2008-04-22 Thread Francisco Sanmartin
I know that only one query of that type does not change anything. But when it's two or more with different boosts, i hope it does. Here is the situation: My docs have Title and Description. What I want to do is to give more relevancy to the morelikethis on the title than on the description. So

Re: More Like This boost

2008-04-22 Thread Erik Hatcher
No, the MLT feature does not have that kind of field-specific boosting capability. It sounds like it could be a useful enhancement though. Of course you do get boosts for interesting terms already, but maybe having an additional field-specific boost would be a nice touch too.

Re: More Like This boost

2008-04-22 Thread Walter Underwood
It should help to weight the terms with their frequency in the original document. That will distinguish between two documents with the same terms, but different focus. wunder On 4/22/08 7:46 AM, Erik Hatcher [EMAIL PROTECTED] wrote: No, the MLT feature does not have that kind of field-specific

Enhancing the query language

2008-04-22 Thread Kamran Shadkhast
The kind usage we have in our seaching the contents news we need a more sofisticated query language. currently the solr query language is not enough for our needs. I understand it is possible to add our own customized query parse to the system, but I was wondering if anybody have done that and if

Re: better stemming engine than Porter?

2008-04-22 Thread Jay
Hi Wagner, Thanks for the intro of KStem! I quickly scanned the original paper on KStem by Robert Krovetz but could not find any timing comparison data on KStem and Porter stem. I wonder how slow/fast Kstem is compared to Porter stem based on your use in your application? Jay Wagner,Harry

RE: better stemming engine than Porter?

2008-04-22 Thread Wagner,Harry
Hi Jay, I did not do a timing comparison either, but any change in performance after switching to Kstem was not noticeable. Cheers... h -Original Message- From: Jay [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 22, 2008 12:26 PM To: solr-user@lucene.apache.org Subject: Re: better

Re: Highlighted field gets truncated

2008-04-22 Thread Mike Klaas
On 19-Apr-08, at 3:02 AM, Christian Wittern wrote: Mike Klaas wrote: Fragments are generated independently from matching (I realize this isn't an ideal algorithm). So it could be that the match is not part of the fragment? This sounds a bit strange. Is there a way to make sure the

logging through log4j

2008-04-22 Thread Henrib
Hi, I'm (still) seeking more advice on this deployment issue which is to use org.apache.log4j instead of java.util.logging. I'm not seeking re-starting any discussion on solr4j/commons/log4j/jul respective benefits; I'm seeking a way to bridge jul to log4j with the minimum specific per-container

Re: Highlighted field gets truncated

2008-04-22 Thread Christian Wittern
Mike Klaas wrote: On 19-Apr-08, at 3:02 AM, Christian Wittern wrote: So it could be that the match is not part of the fragment? This sounds a bit strange. Is there a way to make sure the fragment contains the match other than returning the whole field and do the fragmenting myself? [...]

Re: better stemming engine than Porter?

2008-04-22 Thread Otis Gospodnetic
I actually doubt Porter's is slow. From what I recall, it's a bunch of simple if/elses. KStem can't get added to Lucene core due to its license (search Lucene JIRA for an issue that covered this several years ago). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -

Spellchecker Question

2008-04-22 Thread Matt Mitchell
I'm using the Spellchecker handler but am a little confused. The docs say to run the cmd=rebuild when building the first time. Do I need to supply a q param with that cmd=rebuild? The examples show a url with the q param set while rebuilding, but the main section on the cmd param doesn't say much