Re: german analyers xes me

2009-05-13 Thread Daniel Naber
On Tuesday 12 May 2009, Timon Roth wrote: the queryparser is feeded with the germananalyzer and translates the phrase to offentlich finanx abgaberech. Have you checked the FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 ? If that doesn't help,

Re: changes

2008-11-09 Thread Daniel Naber
On Freitag, 7. November 2008, ChadDavis wrote:  For example, Field.Keyword() is gone.  Shouldn't I find this in that change log? This was removed between 1.9 and 2.0. The plan was that users upgrade to 1.9, fix the deprecation warnings and only then go to 2.0. Thus no every method is

Re: Strange behaviour of FrenchAnalyzer when using accents

2008-11-08 Thread Daniel Naber
On Samstag, 8. November 2008, lamino wrote:         String q = secrétaire; Does it help if you escape it like this: secr\u00e9taire? The java compiler might interpret non-ASCII chars differently, depending on the environment it runs in. Regards Daniel -- http://www.danielnaber.de

Re: Please help to interpret Lucene Boost results

2008-09-26 Thread Daniel Naber
On Freitag, 26. September 2008, student_t wrote: A. query1 = +(content:(Pepsi)) I guess this is the string input you use for your queries, isn't it? It's more helpful to look at the toString() output of the parsed query to see how Lucene interpreted your input. Regards Daniel --

Re: Lucene debug logging?

2008-09-04 Thread Daniel Naber
On Donnerstag, 4. September 2008, Justin Grunau wrote: Is there a way to turn on debug logging / trace logging for Lucene? You can use IndexWriter's setInfoStream(). Besides that, Lucene doesn't do any logging AFAIK. Are you experiencing any problems that you want to diagnose with debugging?

Re: Case Sensitivity

2008-08-27 Thread Daniel Naber
On Mittwoch, 27. August 2008, Michael McCandless wrote: Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? I think it's enough if the api doc explains it, no need to rename it. What's more confusing is that (UN_)TOKENIZED should actually be called (UN_)ANALYZED IMHO. Regards

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: For some reason, the TermQuery is not returning any results, even when querying for a single word (like on*). Sorry, I meant PrefixQuery. Also, do not add the * to the search string when creating the PrefixQuery. Regards Daniel --

Re: Combining Wildcard and Term Queries?

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Chris Bamford wrote: Can you combine these two queries somehow so that they behave like a PhraseQuery? You can use MultiPhraseQuery, see http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/search/MultiPhraseQuery.html Regards Daniel --

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: I just have one more use case. I want the same prefix search as before, plus another match in another field. Not sure if I'm following you, but you can create your own BooleanQuery programmatically, and then add the original PrefixQuery and any

Re: Combining Wildcard and Term Queries?

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Chris Bamford wrote: That sounds like what I'm after - but how do I get hold of the IndexReader so I can call IndexReader.terms(Term) ? The code where I am doing this work is getFieldQuery(String field, String queryText) of my custom query parser ... QueryParser

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: Now I was the one who didn't follow: How do I add a query to an existing query? Something like this should work: BooleanQuery bq = new BooleanQuery(); PrefixQuery pq = new PrefixQuery(...); bq.add(pq, BooleanClause.Occur.MUST); TermQuery tq =

Re: MultiPhrase search

2008-08-25 Thread Daniel Naber
On Montag, 25. August 2008, Andre Rubin wrote: I tried it out but with no luck (I think I did it wrong). In any case, is MultiPhraseQuery what I'm looking for? If it is, how should I use the MultiPhraseQuery class? No, you won't need it. If you know that the field is not really tokenized

Re: Unique list of keywords

2008-08-08 Thread Daniel Naber
On Freitag, 8. August 2008, Martin vWysiecki wrote: i have very much data, about 20GB of text, and need a unique list of keywords based on my text in all docs from the whole index. Simply use IndexReader.terms() to iterate over all terms in the index. You can then use

Re: Lucene performance issues..

2008-07-27 Thread Daniel Naber
On Sonntag, 27. Juli 2008, Mazhar Lateef wrote: We have also tried upgrading the lucene version to 2.3 in hope to improve performance but the results were quite the opposite. but from my research on the internet the Lucene version 2.3 is much faster and better so why are we seeing such

Re: Boost token when storing document?

2008-07-13 Thread Daniel Naber
On Sonntag, 13. Juli 2008, Darren Govoni wrote: Hi, Sorry if I missed this in the documentation, but I wanted to know if Lucene allows boosting of tokens _within_ a field when a document is stored? Yes, you can use payloads for that, see http://wiki.apache.org/lucene-java/Payloads

Re: too many clauses exception

2008-07-04 Thread Daniel Naber
On Freitag, 4. Juli 2008, Gaurav Sharma wrote: I am stuck with an exception in lucene (too many clauses). When i am using a wild card such as a* i am getting too many clauses exception. It saying maximum clause count is set to 1024. Is there any way to increase this count. Please see

Re: document retrieval 100 times slower after finishing some heavy disk operation

2008-06-28 Thread Daniel Naber
On Sonntag, 29. Juni 2008, qaz zaq wrote: indexes which usually take less then 16ms. However, everytime afer some heavy disk operations (such as copy 1G size of a file into that disk) , the document retrieval slows down to couple seconds immediately, even well after this disk operation being

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread Daniel Naber
On Montag, 23. Juni 2008, László Monda wrote: According to the current Lucene documentation at http://lucene.apache.org/java/2_3_2/api/index.html it seems to me that the Query class doesn't have any explain() methods. It's in the IndexSearcher and it takes a query and a document number as its

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: Since fuzzy searching is based on the Levenshtein distance, the distance between coldplay and coldplay is 0 and the distance between coldplay and downplay is 3 so how on earth is possible that when searching for coldplay, Lucene returns

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: Additional info: Lucene seems to do the right thing when only few documents are present, but goes crazy when there is about 1.5 million documents in the index. Lucene works well with more documents (currently using it with 9 million). but the

Re: Displaying and highlighting results from a Wild Card and Fuzzy search using Lucene in Java

2008-06-01 Thread Daniel Naber
On Sonntag, 1. Juni 2008, syedfa wrote: I am trying to display my results from doing a search of an xml document (some quotes from shakespeare's Hamlet) using a WildCard and Fuzzy search, and then I'm trying to highlight the keyword(s) in the results, but unfortunately I am having problems.

Re: Fwd: Snowball not finding purple

2008-05-10 Thread Daniel Naber
On Samstag, 10. Mai 2008, Stephen Cresswell wrote: For some reason it seems that either Lucene or Snowball has a problem with the color purple. According the snowball experts the problem is with lucene. Can anyone shed any light? Thanks, You are aware that you need to use the same analyzer

Re: Fwd: Snowball not finding purple

2008-05-10 Thread Daniel Naber
On Samstag, 10. Mai 2008, Stephen Cresswell wrote: If it was a difference between indexing / querying, why would lucene find the word ribbon and not purple even though they appear in the same document and are both exact matches? Using Snowball, purple becomes purpl but ribbon isn't modified,

Re: Search for phrases

2008-04-15 Thread Daniel Naber
On Dienstag, 15. April 2008, palexv wrote: I have not tokenized phrases in index. What query should I use? Simple TermQuery does not work. Probably PhraseQuery with an argument like java dev (no asterisk). If I try to use QueryParser , what analyzer should I use? Probably KeywordAnalyzer.

Re: Search for phrases

2008-04-14 Thread Daniel Naber
On Montag, 14. April 2008, palexv wrote: For example I need to search for java de* and recieve java developers, java development, developed by java etc. If your text is tokenized, this is not supported by QueryParser but you can create such queries using MultiPhraseQuery. If you don't

Re: Compiled Term Hightlighter

2008-02-09 Thread Daniel Naber
On Samstag, 9. Februar 2008, Cesar Ronchese wrote: I'm not a java developer, so I'm getting stuck on compiling the Term Highlighter of source files acquired from the Lucene Sandbox. The highlighter is part of the release, in Lucene 2.3 it's under

Re: Escape character and Special character

2008-01-30 Thread Daniel Naber
On Mittwoch, 30. Januar 2008, Joshua W Hui wrote: When I tried to do a lucene search using escape character with other special character like the following: SUBJECT:Yahoo\!~0.5 SUBJECT:Yahoo\!* It seems the parser totally ignores the escape character, and becomes It's a known bug, see

Re: Escape character and Special character

2008-01-30 Thread Daniel Naber
On Mittwoch, 30. Januar 2008, Joshua W Hui wrote: Thanks for the information. Does it also apply to fuzzy search? I think so. Also, a simple question... how can I find out which release the fix will go in? Currently, it only has a patch. It's not yet assigned to any version (it says Fix

Re: Retain the index

2008-01-27 Thread Daniel Naber
On Sonntag, 27. Januar 2008, anjana m wrote:         IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), true); The true parameter means that the old index will be deleted, is that your problem? Regards Daniel -- http://www.danielnaber.de

Re: Stemmers remove part of a query when using QueryParser

2008-01-26 Thread Daniel Naber
On Samstag, 26. Januar 2008, Jay Hill wrote: I have added stemming Analyzer to my indexing and searching. I've tried both Porter and KStem, have gotten very good results with both with KStem being the best. The only problem is that, when analyzing on the search end using QueryParser part of

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Daniel Naber
On Dienstag, 22. Januar 2008, Fabrice Robini wrote: Oooops sorry, bad cut/paste... Here is the right one :-) The score is the same, so documents with a lower id (inserted earlier) will be returned first. So everything looks okay to me, or am I missing something? regards Daniel --

Re: Stemming and highlighting

2008-01-04 Thread Daniel Naber
On Freitag, 4. Januar 2008, Marjan Celikik wrote: I am a new Lucene user and I would like to know the following. How does Lucene bring together fuzzy queries and highlighting? You need to call rewrite() on the fuzzy query. This will expand the fuzzy query to all similar terms (e.g. belies~ -

Re: Prioiritze new documents

2007-12-30 Thread Daniel Naber
On Sonntag, 30. Dezember 2007, Dominik Bruhn wrote: Although I set the Boost via doc.setBoost(value) for each document before writing it to the index it doesnt change anything. Even worse if I look at the index using Luke (Version 0.7.1) each document got a boost of 1 not of the value

Re: Analyzer to use with MultiSearcher using various indexes for multiple languages

2007-12-18 Thread Daniel Naber
On Dienstag, 18. Dezember 2007, Jay Hill wrote: We have a requirement to search across multiple languages, so I'm planning to use MultiSearcher, passing an array of all IndexSearchers for each language. You will need to analyze the query once per language and then build a new BooleanQuery

Re: SpellChecker: Spanish Dictionary

2007-12-13 Thread Daniel Naber
On Donnerstag, 13. Dezember 2007, Haroldo Nascimento wrote:   I am using the SpellCheck classes of Lucene for create  the Did you Mean feature.   I need load into memory all verbets of Spanish language (it wil be my dictinary).   Where I can get (download) this dictionary. Maybe .txt file.

Re: content depending Analyzing

2007-12-10 Thread Daniel Naber
On Montag, 10. Dezember 2007, Helmut Jarausch wrote: an Analyzer implements a 'TokenStream(String fieldName, Reader reader) But for me that's too late. When tokenizing the TOC field I would need access to the LANG field to decide how to tokenize. IndexWriter contains an addDocument() call

Re: Explanation

2007-11-23 Thread Daniel Naber
On Samstag, 24. November 2007, John Griffin wrote:             System.out.println(indexSearcher.explain(query, counter).toString()); I think you need to use hits.id() instead of counter. Regards Daniel -- http://www.danielnaber.de

Re: AND query in SHOULD

2007-11-22 Thread Daniel Naber
On Donnerstag, 22. November 2007, Rapthor wrote: I want to realize a search that finds the exact phrase I provide. You simply need to create a PhraseQuery. See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/PhraseQuery.html Regards Daniel --

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-20 Thread Daniel Naber
On Montag, 19. November 2007, flateric wrote: the number returned by delete is 0, but the uid shows up in Luke so it is there. Not sure what the problem might be, but it can surely be analyzed if you write a small self-contained test-case and post it here. Regards Daniel --

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-19 Thread Daniel Naber
On Sonntag, 18. November 2007, flateric wrote: IndexReader ir = IndexReader.open(fsDir); ir.deleteDocuments(new Term(uid, uid)); ir.close(); Has absolutely no effect. What number does ir.deleteDocuments return? If it's 0, the uid cannot be found. If it's 0: note that you need to re-open

Re: Can I use Ispell dictionaries roe analizers in Lucene?

2007-11-18 Thread Daniel Naber
On Sonntag, 18. November 2007, Alebu wrote: 1. To analyze non English language I need to use specific analyzer. You don't have to, but it helps improving recall. Can I use Ispell dictionaries with Lucene? It depends on the dictionary. Some dictionary authors use the ispell flagging system

Re: Can I use Ispell dictionaries roe analizers in Lucene?

2007-11-18 Thread Daniel Naber
On Sonntag, 18. November 2007, Alebu wrote: So what ispell dictionary actually is? List of rules for translation some words (or sentence?) to 'base form'? Or what? It's a list of terms with optional flags. For example: walk/xy In a different file, the flag x would then be defined as append

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-18 Thread Daniel Naber
On Sonntag, 18. November 2007, flateric wrote: Has absolutely no effect. I also tried delete on the IndexWriter - no effect. Please use the tool Luke to have a look inside your index to see if a document with field uid and the uid you're expecting really exists. The field should be

Re: OutOfMemoryError on small search in large, simple index

2007-11-13 Thread Daniel Naber
On Dienstag, 13. November 2007, Lars Clausen wrote: Can it be right that memory usage depends on size of the index rather than size of the result? Yes, see IndexWriter.setTermIndexInterval(). How much RAM are you giving to the JVM now? Regards Daniel -- http://www.danielnaber.de

Re: Hits.score mystery

2007-11-01 Thread Daniel Naber
On Wednesday 31 October 2007 19:14, Tom Conlon wrote: 119.txt 17.865013    97%    (13 occurences) 45.txt  8.600986 47%  (18 occurences) 45.txt might be a document with more therms so that its score is lower although it contains more matches. Regards Daniel --

Re: Question regarding proximity search

2007-11-01 Thread Daniel Naber
On Thursday 01 November 2007 10:45, Sonu SR wrote: I got confused of proximity search. I am getting different results for the queries TTL:test device~2 and TTL:device test~2 Order is significant, this is described here:

Re: org.apache.lucene.analysis.ngram ???

2007-10-30 Thread Daniel Naber
On Tuesday 30 October 2007 11:57, Marco wrote: I'm trying to use the class org.apache.lucene.analysis.ngram.EdgeNGramTokenizer. I 'm using lucene 2.2.0 and I included i my classpath lucene-core-2.2.0.jar. I have: That class is in contrib/analyzers/lucene-analyzers-2.2.0.jar Regards Daniel

Re: Exception with org.apache.lucene.store.Directory

2007-10-27 Thread Daniel Naber
On Saturday 27 October 2007 13:20, dinesh chothe wrote: %@ page import= org.apache.lucene.store.Directory.* % That's a class, not a package, so try: %@ page import= org.apache.lucene.store.Directory % Similar for the other classes. Regards Daniel -- http://www.danielnaber.de

Re: Exception with org.apache.lucene.store.Directory

2007-10-27 Thread Daniel Naber
On Saturday 27 October 2007 15:11, dinesh chothe wrote: Thanks for your reply. I have changed my all imports. Even I am using %@ page import= org.apache.lucene.store.Directory % still also I am getting same error. Are your JAR files (Lucene etc) in WEB-INF/lib in your web

Re: fuzzy search MultifieldQueryParser - Lucene 2.2

2007-10-26 Thread Daniel Naber
On Friday 26 October 2007 19:06, Zdeněk Vráblík wrote: It works if query string ends with ~, but how to switch it on for all query? That's not supported AFAIK. You will need to iterate over the query (recursively if it's an instance of BooleanQuery) and create a new query where all parts are

Re: Sort by date with Lucene 2.2.0 ...

2007-10-23 Thread Daniel Naber
On Tuesday 23 October 2007 15:57, Dragon Fly wrote: I tried specifying the field type using a SortField object but I got the same result.  I'll be glad to write a stand-alone test case.  Should I post the code to this thread when I'm done or should I submit some sort of bug report? Thanks.

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber
On Sunday 21 October 2007 17:21, Chris Sizemore wrote: i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query, only one field is mentioned! here's my

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber
On Sunday 21 October 2007 17:21, Chris Sizemore wrote: i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query, only one field is mentioned! here's my

Re: Sort by date with Lucene 2.2.0 ...

2007-10-19 Thread Daniel Naber
On Thursday 18 October 2007 21:35, Dragon Fly wrote: I'm am trying to sort a date field in my index but I'm seeing strange results.  I have searched the Lucene user mail archive for Datetools but still couldn't figure out the problem. It shouldn't make a difference but does it help if you

Re: Norm - please lit it up for me

2007-10-19 Thread Daniel Naber
On Friday 19 October 2007 19:07, Karl Wettin wrote: doc[0] text: hello hello hello doc[1] text: hello With normalization doc[0] and doc[1] are equally important. Omitting   normalization makes doc[0] (usually) three times as important as doc[1]. Not quite, as the normalization only refers

Re: Sample SynonymAnalyzer vs. Lucene 2.2

2007-10-19 Thread Daniel Naber
On Friday 19 October 2007 14:42, Sean Dague wrote: Ends up only indexing the synonym, but not the base word itself. I cannot reproduce the problem, i.e. I see both the original term and its synonyms in the index. Maybe you can post the analyzer that uses this filter or a test case to

Re: Problems with stemming/SpellChecker

2007-10-13 Thread Daniel Naber
On Saturday 13 October 2007 07:57, Christian Aschoff wrote: But as fare as i see (in the API DOC), the GermanAnalyzer is attached   to the IndexWriter, i can't find an way to attach an analyzer it to a   single field... Or do i miss something? See PerFieldAnalyzerWrapper. Regards Daniel --

Re: Problems with stemming/SpellChecker

2007-10-12 Thread Daniel Naber
On Friday 12 October 2007 15:48, Christian Aschoff wrote:  indexWriter = new IndexWriter(MiscConstants.luceneDir,   new GermanAnalyzer(), create); [...] Not NO_NORMS is the problem but GermanAnalyzer. Try StandardAnalyzer on the field you get the suggestions from. Regards Daniel --

Re: Weird operator precedence with default operator AND

2007-10-09 Thread Daniel Naber
On Tuesday 09 October 2007 09:55, Martin Dietze wrote: I've been going nuts trying to use LuceneParser parse query strings using the default operator AND correctly: The operator precedence is known to be buggy. You need to use parenthesis, e.g. (aa AND bb) OR (cc AND dd) regards Daniel --

Re: Indexing Speed using Java Lucene 2.0 and Lucene.NET 2.0

2007-09-10 Thread Daniel Naber
On Monday 10 September 2007 14:59, Laxmilal Menaria wrote: I have created a Index Application using Java lucene 2.0 in java and Lucene.Net 2.0 in VB.net. Both application have same logic. But when I have indexed a database with 14000 rows from both application and same machine, I surprised

Re: Reading Existing index

2007-08-11 Thread Daniel Naber
On Saturday 11 August 2007 02:20, Aleesh wrote:  Need your help regarding reading existing index. Actually I am trying to read an existing index ans just wanted to know, is there a way to identify type of 'Analyzer' which was used at the index creation time? That information is not part of

Re: Fastest way to perform 'like' searches

2007-08-08 Thread Daniel Naber
On Wednesday 08 August 2007 10:28, Ard Schrijvers wrote: Does anybody know a more efficient way? A PhraseQuery might get me somewhere, isn't? No, you need to use MultiPhraseQuery, and you will need to first epxand the terms with the * yourself (e.g. using term enumeration). as a phrase is

Re: docFreq takes long time to execute in a multiple index environment

2007-08-06 Thread Daniel Naber
On Monday 06 August 2007 01:40, tierecke wrote:         Term term=new Term(contents, termstr);         TermEnum termenum=multireader.terms(term);         int freq=termenum.docFreq(); IndexReader has a docFreq() method, no need to get a Term enumeration. regards Daniel --

Re: Query parsing?

2007-07-25 Thread Daniel Naber
On Wednesday 25 July 2007 00:44, Lindsey Hess wrote: Now, I do not need Lucene to index anything, but I'm wondering if Lucene has query parsing classes that will allow me to transform the queries. The Lucene QueryParser class can parse the format descriped at

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread Daniel Naber
On Monday 21 May 2007 22:05, bhecht wrote: Is there any point for me to start creating custom analyzers with filter for stop words, synonyms, and implementing my own sub string filter, for separating tokens into sub words (like mainstrasse= main, strasse) Yes: I assume your document should

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread Daniel Naber
On Monday 21 May 2007 22:53, bhecht wrote: If someone searches for mainstrasse, my tools will split it again to main and strasse, and then lucene will be able to find it. strasse will match mainstrasse but the phrase query schöne strasse will not match schöne mainstrasse. However, this could

Re: Simple, always do wildcard or fuzzy query

2007-05-11 Thread Daniel Naber
On Thursday 10 May 2007 23:09, bbrown wrote: I think this is a simple question; or dont know. Is there a way to automatically convert all tokens to wildcard query with any given input. Either just append the * before you pass your terms, or extend QueryParser and overwrite getFieldQuery() to

Re: Lucene Developer

2007-05-11 Thread Daniel Naber
On Friday 11 May 2007 19:21, Chris Hostetter wrote: Please do not send resume requests to any of the @lucene email lists. There is a wiki page listing parties available for hire who are knowledgable in Lucene for this explicit purpose... http://wiki.apache.org/lucene-java/Support

Re: search problem/odd results

2007-05-09 Thread Daniel Naber
On Wednesday 09 May 2007 16:17, John Powers wrote: Yes, it doesn't work.     it gives an error modal dialog box that says IMPL. Is there a more useful error message when you start Luke from the command line and try to open the index? Regards Daniel -- http://www.danielnaber.de

Re: Locking in Lucene 2.1

2007-05-09 Thread Daniel Naber
On Wednesday 09 May 2007 21:18, Andreas Guther wrote: Do I miss something here or is the documentation not updated? Looks like that part of the documentation isn't up-to-date. The file is called write.lock and it's stored in the index directory. Could you file an issue so the documentation

Re: search problem/odd results

2007-05-08 Thread Daniel Naber
On Tuesday 08 May 2007 23:42, John Powers wrote: I've had problems with luke in the past not being able to read the files. Just make sure you specify the directory, not the files when opening an index with Luke. Also use the latest version (0.7). Regards Daniel --

Re: Leading and trailing wildcard together

2007-04-21 Thread Daniel Naber
On Saturday 21 April 2007 17:16, Mohsen Saboorian wrote: however I wasn't able to search for *Foo* (while ?Foo* and even ?*Foo* works). Is it possible to have leading and trailing star wildcard together? That's a bug in the 2.1 release which has been fixed in SVN trunk. There's also a patch

Re: Issue with : Searcher.search() returning Hits of same length for different searches

2007-04-11 Thread Daniel Naber
On Wednesday 11 April 2007 18:51, Lokeya wrote: Thanks for your reply. I should have given more information and will keep in mind this for my future queries. If nothing else helps, please write a small, standalone test-case that shows the problem. This can then easily be debugged by someone

Re: luke v0.7 and SnowBallAnalyzer

2007-04-06 Thread Daniel Naber
On Thursday 05 April 2007 17:07, Paul Hermans wrote: I do receive the message java.lang.ClassNotFound: net.sf.snowball.ext.GermansStemmer. This class is not part of the lukeall-0.7.jar, but it's in lucene-snowball-2.1.0.jar (which you can find on the Luke homepage). You will then need to

Re: Fwd: Unable to retreive 2/13 field values

2007-02-27 Thread Daniel Naber
On Tuesday 27 February 2007 19:21, Michael Barbarelli wrote: GB821628930  (+VAT_reg:GB* doesn't work) What about VAT_reg:gb*? Also see QueryParser.setLowercaseExpandedTerms() Regards Daniel -- http://www.danielnaber.de -

Re: indexing and searching the document title question

2007-02-27 Thread Daniel Naber
On Tuesday 27 February 2007 23:07, Phillip Rhodes wrote: NAME:color me mine^2.0 (CONTENTS:color CONTENTS:me CONTENTS:mine) Try a (much) higer boost like 20 or 50, does that help? Regards Daniel -- http://www.danielnaber.de

Re: possible to disable internal caching?

2007-02-14 Thread Daniel Naber
On Wednesday 14 February 2007 17:12, jm wrote: So my question, is it possible to disable some of the caching lucene does so the memory consumption will be smaller (I am a bit concerned on the memory usage side)? Or the memory savings would not pay off? You could set

Re: Merge factor problem,

2007-02-09 Thread Daniel Naber
On Friday 09 February 2007 17:14, Sairaj Sunil wrote: I have increased the merge factor from 10 to 50. Please try increasing setMaxBufferedDocs() instead, does that help? Regards Daniel -- http://www.danielnaber.de - To

Re: Slow performance (Fetching Hits)

2007-02-08 Thread Daniel Naber
On Thursday 08 February 2007 13:54, Laxmilal Menaria wrote: This will take more than 30 secs for 1,50,000 docs (40 MB Index).. What exactly takes this much time? You're not iterating over all hits, are you? Also see

Re: upgrading from Lucene 1.4.3 to Lucene 2.0

2007-02-06 Thread Daniel Naber
On Tuesday 06 February 2007 13:21, [EMAIL PROTECTED] wrote: Which performance improvements can I expect when upgrading from Lucene 1.4.3 to Lucene 2.0 ? This is difficult to say, but you can update to Lucene 1.9 probably without doing any changes to your code and then make a performance test.

Re: search keyword Fields +Text Field with BooleanQuery

2007-01-22 Thread Daniel Naber
On Monday 22 January 2007 17:19, Xue, Yijun wrote: I try a query Secondname:Beckwith AND Firstname:Louise AND content:school on Luke with WhitespaceAnalyzer, I can get hits, but nothing if I use StandardAnalyzer You need to use the same analyzer for indexing and searching. For example,

Re: rewriting wildcard query before highlighting

2007-01-18 Thread Daniel Naber
On Thursday 18 January 2007 14:48, Mark Miller wrote: Would it be more efficient to make a RAM index with just the doc to be highlighted and then pass the reader of that into the rewrite method before highlighting a query that expands? Yes, that's a valid approach, especially using

Re: confuse of required and prohibited in BooleanQuery

2007-01-17 Thread Daniel Naber
On Wednesday 17 January 2007 11:30, David wrote:    2.There are four logical combinations of these flags, but the case where both are true is an illogical and invalid combination    but I don't know why, Can anybody explain it to me? You're right. Because of this the API was changed in Lucene

Re: Remote Searcher performance and Document retrieval

2007-01-08 Thread Daniel Naber
On Monday 08 January 2007 23:08, sashaman wrote: Can anyone comment on this performance issue? Have you compared to a local index? It's not uncommon for several doc() calls to take more time than searching, as doc() requires a lot I/O, even locally. Regards Daniel --

Re: lucene injection

2006-12-21 Thread Daniel Naber
On Thursday 21 December 2006 10:56, Deepan wrote: I am bothered about security problems with lucene. Is it vulnerable to any kind of injection like mysql injection? many times the query from user is passed to lucene for search without validating. This is only an issue if your index has

Re: boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Daniel Naber
On Thursday 21 December 2006 10:55, Martin Braun wrote: and in my case I have some documents which have same values in many fields (=same score) and the only difference is the year. Andrzej's response sounds like a good solution, so just for completeness: you can sort by more than one

Re: to boost or not to boost

2006-12-20 Thread Daniel Naber
On Wednesday 20 December 2006 17:32, Martin Braun wrote: so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should get a boost of 1.1975 . The boost is stored with a limited resolution. Try boosting one doc by 10, the other one by 20 or something like that. Regards Daniel --

Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-19 Thread Daniel Naber
On Tuesday 19 December 2006 23:05, Scott Sellman wrote:                         new BooleanClause.Occur[]{BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD} Why do you explicitly specify these operators? q.add(keywordQuery, BooleanClause.Occur.MUST); //true, false); You seem to wrap a

Re: Indexing clarification , please advice

2006-12-13 Thread Daniel Naber
On Wednesday 13 December 2006 14:10, abdul aleem wrote: a) Indexing large file ( more than 4MB )    Do i need to read the entire file as string using    java.io and create a Document object ? You can also use a reader:

Re: de-boosting fields

2006-12-09 Thread Daniel Naber
On Saturday 09 December 2006 02:25, Scott Smith wrote: What is the best way to do this?  Is changing the boost the right answer?  Can a field's boost be zero? Yes, just use: term1 term2 category1^0 category2^0. Erick's Filter idea is also useful. Regards Daniel --

Re: Lucene search performance: linear?

2006-12-05 Thread Daniel Naber
On Tuesday 05 December 2006 03:49, Zhang, Lisheng wrote: I found that search time is about linear: 2nd time is about 2 times longer than 1st query. What exactly did you measure, only the search() or also opening the IndexSearcher? The later depends on index size, thus you shouldn't re-open

Re: Customized Analyzer

2006-12-05 Thread Daniel Naber
On Tuesday 05 December 2006 21:37, Alice wrote: It does not work. Even with the synonyms indexed it is not found. So if your text contains wind it is not found by the query that prints as content:(wind window)? Then I suggest you post a small test case that shows this problem. As Chris

Re: Customized Analyzer

2006-12-05 Thread Daniel Naber
On Tuesday 05 December 2006 20:14, Alice wrote: It returns content:(wind window) That might be the correct representation of a MultiPhraseQuery. So does your query work anyway? It's just that you cannot use QueryParser again to parse this output (similar to some other queries like

too many parentheses confuse Lucene

2006-12-05 Thread Daniel Naber
Hi, a query like (-merkel) AND schröder is parsed as +(-body:merkel) +body:schröder I get no hits for this query because +(-body:merkel) doesn't return any hits (it's not a valid query for Lucene). However, a query like -merkel AND schröder works fine. From the user's point-of-view, both

Phrase queries with wildcards

2006-12-03 Thread Daniel Naber
Hi, Lucene's phrase queries don't support wildcards and I'm thinking about the best way to fix this. One way would be to change QueryParser so that it builds a MultiPhraseQuery when it encounters a wildcard inside a phrase. However, to expand the wildcard the QueryParser needs an IndexReader.

Re: BUG ? - lucene multisearcher / sorting

2006-12-01 Thread Daniel Naber
On Friday 01 December 2006 15:16, Kai R. Emde wrote: When we search material as an example, we found 207 hits in the the index. When we search this index in the multisearcher, with 3 index, there 206 hits contiguous and one after the next. OK bookA1, bookA2, bookA3 ... bookA206, bookB1,

Re: BUG ? - lucene multisearcher / sorting

2006-11-29 Thread Daniel Naber
On Tuesday 28 November 2006 23:09, Kai R. Emde wrote: we have one problem with the sort routine. We use the multisearcher function over severall index. Does that also happens when you're not using MultiSearcher? Could you post a small test case that demonstrates this problem? To my knowledge,

Re: NOT queries

2006-11-21 Thread Daniel Naber
On Tuesday 21 November 2006 23:14, Antony Bowesman wrote: I assume that you first have to create a BooleanClause that finds everything and then another Clause that removes the attribute. Is this right or is there another way to do it? That's correct. For the find everything part you can

Japanese word segmentation

2006-11-18 Thread Daniel Naber
Hi, does anybody know a (more or less) ready-to-use free Japanese analyzer? I know I can use CJKAnalyzer but I need one that puts only real words into the index (no just n-grams). There seem to be a lot of papers on the Web and there's also Juma, but I'm looking for a Java-based solution.

Re: Indexing Performance issue

2006-11-10 Thread Daniel Naber
On Friday 10 November 2006 12:18, spinergywmy wrote:  I having this indexing the pdf file performance issue. It took me more than 10 sec to index a pdf file about 200kb. Is it because I only have a segment file? How can I make the indexing performance better? PDFBox (which I assume you are

  1   2   >