Re: Chinese Segmentation with Phase Query

2007-11-10 Thread Uwe Goetzke
of the abbreviations) Regards Uwe Goetzke -Ursprüngliche Nachricht- Von: Cedric Ho [mailto:[EMAIL PROTECTED] Gesendet: Samstag, 10. November 2007 02:28 An: java-user@lucene.apache.org Betreff: - Re: Chinese Segmentation with Phase Query On Nov 10, 2007 2:08 AM, Steven A Rowe [EMAIL PROTECTED] wrote

feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-01 Thread Uwe Goetzke
This week I switched the lucene library version on one customer system. The indexing speed went down from 46m32s to 16m20s for the complete task including optimisation. Great Job! We index product catalogs from several suppliers, in this case around 56.000 product groups and 360.000 products

AW: Does Lucene support partition-by-keyword indexing?

2008-03-01 Thread Uwe Goetzke
Hi, I do not yet fully understand what you want to achieve. You want to spread the index split by keywords to reduce the time to distribute indexes? And you want the distribute queries to the nodes based on the same split mechanism? You have several nodes with different kind of documents.

AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-24 Thread Uwe Goetzke
else and this is the reason the total process of indexing to be not so reasonably faster. Best Regards, Ivan Uwe Goetzke wrote: This week I switched the lucene library version on one customer system. The indexing speed went down from 46m32s to 16m20s for the complete task including

AW: Implement a relaxed PhraseQuery?

2008-03-24 Thread Uwe Goetzke
Hi Cuong , I have written a TolerantPhraseScorer starting with the code from PhraseScorer but I think I have modified it to much to be generally useful. We use it with bigramm clusters and therefore does not need the slop factor for scoring but have a tolerance factor (depending on the length

AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-25 Thread Uwe Goetzke
the NGramAnalyzer? -jake On 3/24/08, Uwe Goetzke [EMAIL PROTECTED] wrote: Hi Ivan, No, we do not use StandardAnalyser or StandardTokenizer. Most data is processed by fTextTokenStream = result = new org.apache.lucene.analysis.WhitespaceTokenizer(reader); result = new ISOLatin2AccentFilter

AW: AW: feedback: Indexing speed improvement lucene 2.2-2.3.1

2008-03-26 Thread Uwe Goetzke
? Thanks! Jay Uwe Goetzke wrote: Hi Ivan, No, we do not use StandardAnalyser or StandardTokenizer. Most data is processed by fTextTokenStream = result = new org.apache.lucene.analysis.WhitespaceTokenizer(reader); result = new ISOLatin2AccentFilter(result

AW: Transforming german umlaute like ö,ä,ü ,ß into oe, ae, ue, ss

2008-11-18 Thread Uwe Goetzke
; } } return output.toString(); } } Regards Uwe Goetzke Leiter Produktentwicklung Healy Hudson GmbH Procurement Retail Solutions -Ursprüngliche Nachricht- Von: Sascha Fahl [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 18. November

AW: Most frequently indexed term

2009-06-08 Thread Uwe Goetzke
Hello Ganesh, What about making a seperate index for each day, get your analysis and merge thereafter that index. I am not sure but I think this might work. Use MultiSearcher for the search. Regards Uwe Goetzke -Ursprüngliche Nachricht- Von: Ganesh [mailto:emailg...@yahoo.co.in

MergePolicy$MergeException because of FileNotFoundException because wrong path to index-file

2009-08-31 Thread Uwe Goetzke
GetPropertyAction(file.separator))).charAt(0); Which sounds more than strange to me... Any idea? Regards Uwe Goetzke --- Healy Hudson GmbH - D-55252 Mainz Kastel Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB

AW: MergePolicy$MergeException because of FileNotFoundException because wrong path to index-file

2009-08-31 Thread Uwe Goetzke
Ups, sorry 2.4.1 Thx Uwe Goetzke -Ursprüngliche Nachricht- Von: Uwe Schindler [mailto:u...@thetaphi.de] Gesendet: Montag, 31. August 2009 17:42 An: java-user@lucene.apache.org Betreff: RE: MergePolicy$MergeException because of FileNotFoundException because wrong path to index-file

AW: Relevancy Practices

2010-05-03 Thread Uwe Goetzke
Regarding Part3: Data quality For our search domain (catalog products) we face very often the problem that the search data is full of acronyms and abbreviations like: cable,nym-j,pvc,3x2.5mm² or dvd-/cd-/usb-carradio,4x50W,divx,bl We solved this by a combination of normalization for better data

AW: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread Uwe Goetzke
Index all into a directory and determine the size of all files in it. From http://lucene.apache.org/java/3_0_1/fileformats.html Starting with Lucene 2.3, doc store files (stored field values and term vectors) can be shared in a single set of files for more than one segment. When compound file

Problem with sorting on NumericFields

2010-10-26 Thread Uwe Goetzke
I got stuck on a problem using NumericFields using with lucene 2.9.3 I add values to the document by doc.add(new NumericField(minprice).setDoubleValue(net_price)); If I want to search with a sorter for this field, I get this error: java.lang.NumberFormatException: Invalid shift

AW: Problem with sorting on NumericFields

2010-10-26 Thread Uwe Goetzke
should reindex the whole stuff or at least try to optimize the index to get rid of deleted documents and the terms. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Goetzke [mailto:uwe.goet...@veenion.de