Re: query.extractTerms(..) on rewritten queries

2014-10-07 Thread Christian Reuschling
.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Christian Reuschling [mailto:reuschl...@dfki.uni-kl.de] Sent: Monday, October 06, 2014 6:06 PM To: java-user@lucene.apache.org Subject: query.extractTerms(..) on rewritten queries

query.extractTerms(..) on rewritten queries

2014-10-06 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, currently I migrate to Lucene 4. In the past, I did a trick to get the index specific terms for an according (wildcard) query (see below). But it don't works anymore: String queryString = n*; // gives no result // String queryString = nöä; //

BooleanWeight.scorer() gives a TermScorer

2014-08-07 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I try to get the scorer for a result document, for further computation. ListAtomicReaderContext leafContexts = indexReader.leaves(); int n = ReaderUtil.subIndex(scoreDoc.doc, leafContexts); AtomicReaderContext ctx = leafContexts.get(n);

Re: Migration Lucene 3=4: IndexSearcher.setDefaultFieldSortScoring(..)

2014-07-22 Thread Christian Reuschling
, at 10:17 AM, Christian Reuschling reuschl...@dfki.uni-kl.de wrote: We currently migrate one project to Lucene 4 and noticed that the method IndexSearcher.setDefaultFieldSortScoring(..) disappeared in Lucene 4.0. We can't find something about this in the migration guide. Further

Migration Lucene 3=4: IndexSearcher.setDefaultFieldSortScoring(..)

2014-07-18 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 We currently migrate one project to Lucene 4 and noticed that the method IndexSearcher.setDefaultFieldSortScoring(..) disappeared in Lucene 4.0. We can't find something about this in the migration guide. Further, it was never deprecated in Lucene 3,

searching multiple remote indices

2014-06-18 Thread Christian Reuschling
an exotic case. Or is it? Thanks from the whole DFKI Lucene crew! Christian - -- __ Christian Reuschling, Dipl.-Ing.(BA) Software Engineer Knowledge Management Department German Research Center for Artificial

transparently access a remote index: new alternative to old RemoteSearchable / Searcher interfaces

2014-06-04 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I remember that there was a general Searcher interface, with the standard IndexSearcher as subclass, plus some subclass that enabled RMI-based remote access to an index. In the case you used Searcher in your codebase, the code was independent from

create a Filter/DocIdSet from a number of documents

2014-03-12 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have a small set of document numbers as a query result collected with some non-scoring collector. Now, I want to send high-performant successive queries only in this document number scope, as part of a customized Similarity implementation

tf/idf similarity with modified document similarity

2014-03-06 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, what is the best method to score documents similar to default similarity, but the document frequency should be calculated per query against the matching result document set, not statically against the whole corpus. Didn't found a good and

Re: FuzzySuggester EXACT_FIRST criteria

2013-11-20 Thread Christian Reuschling
very complex. Thanks a lot! Christian Reuschling On 15.11.2013 18:49, Michael McCandless wrote: Hmm, I'm not sure offhand why that change gives you no results. The fullPrefixPaths should have been a super-set of the original prefix paths, since the LevA just adds further paths. Mike

Re: FuzzySuggester EXACT_FIRST criteria

2013-11-14 Thread Christian Reuschling
? On 14.11.2013 17:05, Michael McCandless wrote: On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling christian.reuschl...@gmail.com wrote: We started to implement a named entity recognition on the base of AnalyzingSuggester, which offers the great support for Synonyms, Stopwords, etc

FuzzySuggester EXACT_FIRST criteria

2013-11-13 Thread Christian Reuschling
We started to implement a named entity recognition on the base of AnalyzingSuggester, which offers the great support for Synonyms, Stopwords, etc. For this, we slightly modified AnalyzingSuggester.lookup() to only return the exactFirst hits (considering the exactFirst code block only, skipping

Re: Empty numeric field

2012-02-15 Thread Christian Reuschling
placeholder value (like -1, infinity, NaN). If you only need it in the stored fields, just store it but don't index it. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Christian Reuschling

Re: Empty numeric field

2012-02-15 Thread Christian Reuschling
http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Christian Reuschling [mailto:christian.reuschl...@gmail.com] Sent: Wednesday, February 15, 2012 12:58 PM To: java-user Subject: Empty numeric field Hi all, for some reason, we need empty numeric

Re: Numeric field min max values

2011-11-08 Thread Christian Reuschling
values by looking at the first few bits, which contains the precision. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Christian Reuschling [mailto:christian.reuschl...@gmail.com] Sent: Monday

Re: Numeric field min max values

2011-11-07 Thread Christian Reuschling
presume) to int or long or whatever. Maybe that will help. -- Ian. On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling christian.reuschl...@gmail.com wrote: Hi, maybe it is an easy question - I searched over the lucene-user archive, but sadly didn't found an answer :( I currently

Re: Numeric field min max values

2011-11-03 Thread Christian Reuschling
. Maybe that will help. -- Ian. On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling christian.reuschl...@gmail.com wrote: Hi, maybe it is an easy question - I searched over the lucene-user archive, but sadly didn't found an answer :( I currently change our field logic from string

Numeric field min max values

2011-11-02 Thread Christian Reuschling
Hi, maybe it is an easy question - I searched over the lucene-user archive, but sadly didn't found an answer :( I currently change our field logic from string- to numeric fields. Until now, I managed to find the min-max values of a field by iterating over the field with a TermEnum (termEnum =

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Christian Reuschling
Hello Michael, I also would prefer B - it also shortens the time to have a benefit of new Lucene features in our applications. It forces our lazy programmers (I am of course ;) ) to deal with them - and reduces the efford to change to a major release afterwards. Maybe some minimum time waiting

Filter for searching in result lists with 2.9

2009-10-16 Thread Christian Reuschling
Hi guys, in our app we gives the possibility to search inside a set of documents, which is the result list of a former search. Thus, someone can shrink down a search according different criterias. For this, we implemented a simple Filter that simply gets a TopDocs Object and creates a bitSet out

How to sort and get document scores afterwards

2009-10-15 Thread Christian Reuschling
Hi, our application enables sorting the result lists according to field values, currently all represented as Strings (we plan to also migrate to the new numeric type capabilities of Lucene 2.9 at a later time) For this, the documents will be sorted e.g. according to the author, which works fine

Re: Search By Phrase Not Working

2009-10-08 Thread Christian Reuschling
Hi, I had similar behaviour. On an self-build index on german wikipedia I searched for the phrase blaue blume. I've got 2 results. When I searched for +blaue blume vogel I've got 59 results...strange. I found out that when I create a plain BooleanQuery with just the phrase blaue blume gives

Re: Reverse stemmer?

2009-10-08 Thread Christian Reuschling
Hi, looking up the different terms with a common stem can be useful in different scenarios - so I don't want to judge it whether someone needs it or not. E.g., in the case you have multilingual documents in your index, it is straight forward to determine the language of the documents in order to

Re: How to normalize Lucene score?

2009-08-17 Thread Christian Reuschling
Hi Prashant, we let convergate the scores to 1 - whereby they will never reach one, to have also correct ratings with respect to higher Lucene scores which are more or less open-ended: normalizedScore = 1 - [ 1 / (1+luceneScore) ] best Christian On Sun, 16 Aug 2009 19:04:44 +0530 prashant

ParallelMultiSearcher and idf

2009-08-04 Thread Christian Reuschling
Hello, when searching over multiple indices, we create one IndexReader for each index, and wrap them into a MultiReader, that we use for IndexSearcher creation. This is fine for searching multiple indices on one machine, but in the case the indices are distributed over the (intra)net, this

Determining index term count

2009-01-07 Thread Christian Reuschling
Is there a fast way to determine the total number of terms inside an index? Currently I only found the way to walk through the TermEnumeration, i.e. TermEnum termEnum4TermCount = reader.terms(); int iTermCount = 0; while (termEnum4TermCount.next()) iTermCount++; termEnum4TermCount.close();

Re: 1:n queries again

2008-11-13 Thread Christian Reuschling
is a statement of the problem you're trying to solve, because I'm having trouble understanding the underlying use-cases.. Best Erick On Wed, Nov 12, 2008 at 10:17 AM, Christian Reuschling [EMAIL PROTECTED] wrote: Hello Erick, thank you very much for this interesting idea - but I'm

Re: 1:n queries again

2008-11-12 Thread Christian Reuschling
behaviour, you need some kind of logical 'grouping' of one dataset. whereby a query 'term1 term4' should NOT match, 'term1 term2' must match. Stefan Trcek schrieb: On Wednesday 12 November 2008 14:58:53 Christian Reuschling wrote: In order to offer some simple 1:n matching, currently we create

Re: 1:n queries again

2008-11-12 Thread Christian Reuschling
not being important: attName:startDelimiter myterm2 myterm1 endDelimiter...should also match Did you really mean to have myterm2 in front of myterm1? Best Erick On Wed, Nov 12, 2008 at 8:58 AM, Christian Reuschling [EMAIL PROTECTED] wrote: Hello Friends, In order to offer some

1:n queries again

2008-11-12 Thread Christian Reuschling
, or do I have to write my own Query implementation - and what would be the best way in this case. Thanks in advance Christian Reuschling signature.asc Description: OpenPGP digital signature

term offsets wrong depending on analyzer

2008-11-07 Thread Christian Reuschling
, greetings Christian Reuschling package org.dynaq; import org.apache.lucene.analysis.KeywordAnalyzer; import org.apache.lucene.analysis.PerFieldAnalyzerWrapper; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import

which version of lucene do you recommend

2008-09-09 Thread Christian Reuschling
in the past, I made really good experiences with the svn versions of lucene - I never had problems, and everything feeled stable. Currently, I get unexpected exceptions from time to time: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _3g6n.fdx

yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Christian Reuschling
Hello people, yes, there were several threads about this topic, but I sadly have to respawn it, I'm sorry. The first I found was a discussion from May 2005: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL PROTECTED] There the final solution suggestion from Hoss

yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Christian Reuschling
Hello people, I'm sorry if I have send this message twice - my gmail interface merges the mails in the 'send' folder with incoming mails from my adress - strange, but I can't say if the mail was sent - I only see it in the send-folder (with only one label on it, which brings me to send it again

Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to

Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to

Refreshing IndexReaders for our desktop searching app

2008-05-27 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to

Re: SoundEx

2006-01-18 Thread Christian Reuschling
yes, look at the 'contributions' link at the lucene-homepage. The 'Phonetix'-project provides an implementation for soudex, metaphor and double-metaphor. Simply use their analyzer. I am not sure what the behaviour is in the case of wildcards. Have anyone an answer? regards Christian Steven