Re: Building lucene index using 100 Gb Mobile HardDisk

2007-02-05 Thread John Haxby
maureen tanuwidjaja wrote: Oh is it?I didn't know about that...so Is it means I cant use this Mobile HDD.. Damien McCarthy [EMAIL PROTECTED] wrote: FAT 32 imposes a lower file size limitation than NTF. Attempts to create files greater that 4Gig on FAT32 will throw error you are seeing. Not

Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks for taking time to answer me. The problem is that I'm not allowed to post code due to a confidentiality contract that I was required to sign. I'll try to see if I can get a special permission to post code since I'm wasting so much time trying to find the answer to this. I tried looking

Re: Problem with a search engine

2007-02-05 Thread Mark Miller
StandardAnalyzer does not index pure numbers. It will index alphanumeric tokens and numbers that are connected with one of: _|-|/|.|, If you wish to index pure numbers you might want to add another regex to StandardAnalyzer that recognizes a series of digits - don't forget to add the new token

Re : Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks for your help, As I stated before, the numbers, whether pure or not, are indexed, for I can search them with luke. But supposing what you're saying was the case, the search for 10-year should return 4 items (according to the number of occurence found by luke). Problem is that the number

Re: Re : Re: Problem with a search engine

2007-02-05 Thread Erick Erickson
Have you tried looking at the actual query submitted with Query.toString()? That might give you an insight into what is actually being submitted to Lucene and a place to start. Also be aware that QueryParser, the default operator is OR which can produce unexpected results if you assume AND.

Re : Re: Re : Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks for your help ! Wow, I never expected that many replies. Cool ! I did try to print out the query before and after it gets processed by QueryParser and let say my query is 2003, before and after it will be 2003. If I put report 2003 the query will be, before and after getting into the

Re: Re : Re: Re : Re: Problem with a search engine

2007-02-05 Thread Chiradeep Vittal
Perhaps the number (dates?) are being indexed in a separate field? Lucene will only search the default field with the queries you have shown. If, for instance the year was being stored in the year field, then your query should be report AND year:2003 HTH - Original Message From:

Re : Re: Re : Re: Re : Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks ! I thought that would be the case too, but it's not. 2003 is just stored in the contents field as everything else. The only field indexed is the contents field. Since only the contents field is indexed, everything that is searched should be found. The number problem does restrict

relevancy buckets and secondary searching

2007-02-05 Thread Erick Erickson
Am I missing anything obvious here and/or what would folks suggest... Conceptually, I want to normalize the scores of my documents during a search BUT BEFORE SORTING into 5 discrete values, say 0.1, 0.3, 0.5, 0.7, 0.9 and apply a secondary sort when two documents have the same score. Applying

Re: relevancy buckets and secondary searching

2007-02-05 Thread Peter Keegan
Hi Erick, The timing of your posting is ironic because I'm currently working on the same issue. Here's a solution that I'm going to try: Use a HitCollector with a PriorityQueue to sort all hits by raw Lucene score, ignoring the secondary sort field. After the search, re-sort just the hits from

RE: An arguable bug in Lucene 1.9.1

2007-02-05 Thread Lee_Gary
I am seeing this issue as well with the exact same stack trace using spanQueries. Does anyone know if this has been fixed for versions after 1.9.1? Thanks Gary -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 23, 2006 07:23 AM To:

Re: relevancy buckets and secondary searching

2007-02-05 Thread Erick Erickson
Well, I'm glad to see that it's not just meG. What I'd tentatively thought about was using a TopDocs (search(Query, Filter, Num) form) to get me the top Num documents (by relevance), then doing the bucketing and sorting myself at that point. But your suggestion made me look at FieldSortedHitQueue