Hi, i want to index an excel file with nutch and i have the following error:
http://dev.torrez.us/public/2006/pundit/java/src/plugin/parse-msexcel/sample/test.xls:
failed(2,0): Can't be handled as Microsoft document.
java.lang.ArrayIndexOutOfBoundsException: No cell at position col1, row 0.
I
On Montag, 19. November 2007, flateric wrote:
the number returned by delete is 0, but the uid shows up in Luke so it
is there.
Not sure what the problem might be, but it can surely be analyzed if you
write a small self-contained test-case and post it here.
Regards
Daniel
--
Our document contains a total of 23 fields in one document and we STORE all
of them in lucene index.
We have recently had some performance issues and our analysis has shown the
bottleneck to be lucene search and retrieval.
We have been thinking about reducing the number of fields per document
On Nov 20, 2007, at 6:29 AM, kumarlimbu wrote:
Our document contains a total of 23 fields in one document and we
STORE all
of them in lucene index.
We have recently had some performance issues and our analysis has
shown the
bottleneck to be lucene search and retrieval.
Perhaps you
You can also rely on that by default documents are
collected in-docid-order. You can therefore use your own
hit collector that when collecting doc with id n2,
assuming the previous doc collected had id n1,
would (know to) assign score 0 to all docs
with: n1 id n2.
In other words, you can know
Hi,
I am willing to have a query parser which is fault tolerant. I have search
over the archive, and I have found this :
http://www.nabble.com/Error-tolerant-query-parsing-tf108987.html#a300382
I also want my parser to have very simple feature : phrase search and field
search. So I need to
How are you doing your search? When you say lucene is the
bottleneck, that encompasses a lot. You really need to pinpoint things
a bit more
1 are you iterating over the hits object for many docs? This is bad.
2 are you using a HitCollector and reading the doc each time you get
to the collect
I've been stepping through the contrib MoreLikeThis class and was
wondering if people can give opinions on why you would or would not use
setBoost(true) for the MoreLikeThis object. It seems a bit odd (at least
to me) to boost the good terms in the query (based on the term's score),
since
: Well, the javadocs as patched at LUCENE-584 try to change all
: the cases of zero scoring to 'non matching'.
I'm very out of the loop on LUCENE-584, but i think supporting scores = 0
is an important use case that shouldn't go away ... it makes
CustomScoringQueries a lot morepowerful, and
There is absolutely no reason to send seperate, identical, emails to
solr-user, nutch-user, and java-user ... particularly when it's clear from
past emails you knew your question was specific to nutch since you'd
already asked it on nutch-user 4 days earlier...
: I apologize for cross-posting but I believe both Solr and Lucene users and
: developers should be concerned with this. I am not aware of a better way to
: reach both communities.
some of these questions strike me as being largely unrelated. if
anyone wishes to followup on them further,
11 matches
Mail list logo