Re: Searching repeating fields

2008-11-19 Thread Eran Sevi
If you don't have a lot of entries for each invoice you can duplicate the invoice for each entry - you'll have some field duplications (and bigger index size) between the different invoices but it'll be easy to find exactly what you want. If you have too many different values, I built a solution

Re: 2.4 Performance

2008-11-19 Thread Eric Bowman
[EMAIL PROTECTED] wrote: On an index of around 20 gigs I've been seeing a performance drop of around 35% after upgrading to 2.4 (measured on ~1 requests identical requests, executed in parallel against a threaded lucene / apache setup, after a roughly 1 query warmup). The principal

Re: Special characters prevent entity being indexed

2008-11-19 Thread Pekka Nykyri
Thanks for the quick answer! I haven't specified the analyzer so it should be the StandardAnalyzer. I forgot to mention that I'm using Lucene via Hibernate seach where I can easily define the fields in the hibernate POJO-classes. But as far as I know this shouldn't change things that much

InstatiatedIndex questions

2008-11-19 Thread David Causse
Hi, Here are some differences I noticed between InstanciatedIndex and RAMDirectory : - RAMDirectory seems to do a reset on tokenStreams the first time, this permits to initialise some objects before starting streaming, InstanciatedIndex does not. - I can Serialize a RAMDirectory but I

Re: Special characters prevent entity being indexed

2008-11-19 Thread Erick Erickson
I'm going to have to punt on what Hibernate does/doesn't do since I have no experience there. But in general analyzers are very important. StandardAnalyzer, for instance, tries to recognize e-mail addresses. So it'll create some very interesting tokens, some that are unexpected unless you really

Re: InstatiatedIndex questions

2008-11-19 Thread karl wettin
Hi David, thanks for the report! I suppose you speak of IndexWriter vs InstantiatedIndexWriter? These are definitely considered discrepancy problems. I've created a new issue in the tracker: http://issues.apache.org/jira/browse/LUCENE-1462 For what reason do you try to serialize the

Re: InstatiatedIndex questions

2008-11-19 Thread David Causse
Hi Karl, The reset() problem is not very problematic I can adapt our TokenStreams. For the Serialization : as we need to share very small indexes (200 docs max) in a cluster we need to serialize something. I was planning to use the Java Serialization with maybe some compression on the

Spread of lucene score

2008-11-19 Thread excitingComm2
Hi everybody, as far as I know the lucene score is an arbitrary number between 0.0 and 1.0. Is this correct, that the scores in my resultset are always normalised to this spread or is it possible to get higher scores? Regards, John W. -- View this message in context:

Re: Spread of lucene score

2008-11-19 Thread Mark Miller
excitingComm2 wrote: Hi everybody, as far as I know the lucene score is an arbitrary number between 0.0 and 1.0. Is this correct, that the scores in my resultset are always normalised to this spread or is it possible to get higher scores? Regards, John W. Hits is the class that did the

How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
Hi everybody: I need to make search with lucene 2.3.2, taking in account the dates, previously when I build the index I create a date field where I stored the year in which the document was created, at the search moment I would like to retrieve documents that have been created before a Year or

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ian Lea
Hi - sounds like you need a range query. http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Range%20Searches -- Ian. On Wed, Nov 19, 2008 at 4:02 PM, Ariel [EMAIL PROTECTED] wrote: Hi everybody: I need to make search with lucene 2.3.2, taking in account the dates, previously when

altering the value of non indexed fields

2008-11-19 Thread Diego Cassinera
Hello All I´m writing an application to move full text search out of my rdbms. Today the app hits the db two times. 1) to do the search it self. 2) to format the output of the search results. In my plan I´m moving everything to lucene documents that contain fields where I will be doing the

How to obtain raw scores?

2008-11-19 Thread Teruhiko Kurosaka
Hello, Is there anyway to obtain a raw hit score? I understand the deprecated Hits.getScore() returns normalized scores, relative to each query. Is TopDocs.scoreDocs[i].score also normalized, or raw? I'd like to compare confidence levels of hits among different queries. Thanks. T. Kuro

Re: 2.4 Performance

2008-11-19 Thread Paul Elschot
Op Wednesday 19 November 2008 03:39:01 schreef [EMAIL PROTECTED]: ... Our design is roughly as follows: we have some pre-query filters, queries typically involving around 25 clauses, and some post-processing of hits. We collect counts and filter post query using a hit collector, which uses

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
Thanks, that was very helpful, but I have a question when I make the searches it does not sort the results according to the range, for example: year: [2003 TO 2008] in the first page 2003 documents are showed, in the second 2005 documents, in the third page 2004 documents, I don't see any sort

Re: Lucene implementation/performance question

2008-11-19 Thread Greg Shackles
I have a couple quick questions...it might just be because I haven't looked at this in a week now (got pulled away onto some other stuff that had to take priority). In the searching phase, I would run the search across all page documents, and then for each of those pages, do a search with

RE: How to obtain raw scores?

2008-11-19 Thread Teruhiko Kurosaka
Please ignore this question. I've noticed it was answered in another thread just before I posted my question. Answer: use TopDocs.scoredocs[i].score T. Kuro Kurosaka, Basis Technology San Francisco, California, U.S.A. -

Re: IndexSearcher and multi-threaded performance

2008-11-19 Thread Tomer Gabel
It's more than possible, it's probable. Cache thrashing would definitely be my first guess; with so many copies of the exact same data you're not only missing out on significant gains with the L2 cache, you're also taking a major hit with every cache miss (which probably happens every context

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
it is supposed lucene make a lexicocraphic sorting but this is not hapening, Could you tell me what I'm doing wrong ? I hope you can help me. Regards On Wed, Nov 19, 2008 at 11:56 AM, Ariel [EMAIL PROTECTED] wrote: Thanks, that was very helpful, but I have a question when I make the searches

Re: Term numbering and range filtering

2008-11-19 Thread Paul Elschot
Tim, Op Wednesday 19 November 2008 02:32:40 schreef Tim Sturge: ... This is less than 2x slower than the dedicated bitset and more than 50x faster than the range boolean query. Mike, Paul, I'm happy to contribute this (ugly but working) code if there is interest. Let me know and I'll

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ian Lea
Are you using one of the search methods that includes sorting? If not, then do. If you are, then you need to tell us exactly what you are doing and exactly what you reckon is going wrong. -- Ian. On Wed, Nov 19, 2008 at 6:23 PM, Ariel [EMAIL PROTECTED] wrote: it is supposed lucene make a

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Ariel
Well, this is what I am doing: queryString=year:[2003 TO 2005] [CODE] Query pquery = null; Hits hits = null; Analyzer analyzer = null; analyzer = new SnowballAnalyzer(English); try { pquery = MultiFieldQueryParser.parse(new String[] {queryString, queryString}, new

Re: How to search documents taking in account the dates ???

2008-11-19 Thread Erick Erickson
Well, MultiSearcher is just a Searcher, so you have available all of the search methods on Searcher. One of which is: search public TopFieldDocs file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/search/TopFieldDocs.html *search*(Query

Re: altering the value of non indexed fields

2008-11-19 Thread Michael McCandless
Unfortunately, not yet. There have been discussions about this, including this issue for column-stride fields: https://issues.apache.org/jira/browse/LUCENE-1231 But no real progress on it lately... Mike Diego Cassinera wrote: Hello All I´m writing an application to move full

RE: How to obtain raw scores?

2008-11-19 Thread Ng Vinny
hi Is there any documentation that says that scores obtained from TopDocs.scoredocs[i].score are comparable across queries. I am having this problem myself so I would really appreciate if anyone has some pointers to this. At [1], it seems like they are not. Is there any solution to enable this