indexing excel file

2007-11-20 Thread crazy
Hi, i want to index an excel file with nutch and i have the following error: http://dev.torrez.us/public/2006/pundit/java/src/plugin/parse-msexcel/sample/test.xls: failed(2,0): Can't be handled as Microsoft document. java.lang.ArrayIndexOutOfBoundsException: No cell at position col1, row 0. I

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-20 Thread Daniel Naber
On Montag, 19. November 2007, flateric wrote: the number returned by delete is 0, but the uid shows up in Luke so it is there. Not sure what the problem might be, but it can surely be analyzed if you write a small self-contained test-case and post it here. Regards Daniel --

Is storing 20 fields in a lucene document desirable?

2007-11-20 Thread kumarlimbu
Our document contains a total of 23 fields in one document and we STORE all of them in lucene index. We have recently had some performance issues and our analysis has shown the bottleneck to be lucene search and retrieval. We have been thinking about reducing the number of fields per document

Re: Is storing 20 fields in a lucene document desirable?

2007-11-20 Thread Grant Ingersoll
On Nov 20, 2007, at 6:29 AM, kumarlimbu wrote: Our document contains a total of 23 fields in one document and we STORE all of them in lucene index. We have recently had some performance issues and our analysis has shown the bottleneck to be lucene search and retrieval. Perhaps you

Re: Scoring for all the documents in the index relative to a query

2007-11-20 Thread Doron Cohen
You can also rely on that by default documents are collected in-docid-order. You can therefore use your own hit collector that when collecting doc with id n2, assuming the previous doc collected had id n1, would (know to) assign score 0 to all docs with: n1 id n2. In other words, you can know

Custom query parser

2007-11-20 Thread Nicolas Lalevée
Hi, I am willing to have a query parser which is fault tolerant. I have search over the archive, and I have found this : http://www.nabble.com/Error-tolerant-query-parsing-tf108987.html#a300382 I also want my parser to have very simple feature : phrase search and field search. So I need to

Re: Is storing 20 fields in a lucene document desirable?

2007-11-20 Thread Erick Erickson
How are you doing your search? When you say lucene is the bottleneck, that encompasses a lot. You really need to pinpoint things a bit more 1 are you iterating over the hits object for many docs? This is bad. 2 are you using a HitCollector and reading the doc each time you get to the collect

MoreLikeThis and setBoost

2007-11-20 Thread Donna L Gresh
I've been stepping through the contrib MoreLikeThis class and was wondering if people can give opinions on why you would or would not use setBoost(true) for the MoreLikeThis object. It seems a bit odd (at least to me) to boost the good terms in the query (based on the term's score), since

Re: Scoring for all the documents in the index relative to a query

2007-11-20 Thread Chris Hostetter
: Well, the javadocs as patched at LUCENE-584 try to change all : the cases of zero scoring to 'non matching'. I'm very out of the loop on LUCENE-584, but i think supporting scores = 0 is an important use case that shouldn't go away ... it makes CustomScoringQueries a lot morepowerful, and

Re: indexing excel file

2007-11-20 Thread Chris Hostetter
There is absolutely no reason to send seperate, identical, emails to solr-user, nutch-user, and java-user ... particularly when it's clear from past emails you knew your question was specific to nutch since you'd already asked it on nutch-user 4 days earlier...

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-20 Thread Chris Hostetter
: I apologize for cross-posting but I believe both Solr and Lucene users and : developers should be concerned with this. I am not aware of a better way to : reach both communities. some of these questions strike me as being largely unrelated. if anyone wishes to followup on them further,