Marc,
Can you give a few more details of how you are searching lucene. Maybe
some pseudo code of the method that is fast and the one that is slow. I
think you suggesting that there is a very large performance hit for
doing this:
DocID = Hits.Doc(i).Get("ID")
rather than:
DocID = Hits.ID(i)
JP
P.S. Your numbers suggested that your problem is mostly linear. It looks
like you method has some setup cost and then processes approx 300 Id's a
second
18260 ID's - 72.2 s -avg 253/s
3000 ID's - 10.02s -avg 294/s
830 ID's - 2.25s -avg 368/s
352 ID's - 1.08s -avg 325/s
350 ID's - 0.98s -avg 357/s
278 ID's - 0.48s -avg 162/s
96 ID's - 1.05s -avg 91/s
29 ID's - 0.66s -avg 43/s
Given this linear-ish behavior are you sure that the bottle neck is not
writing back to file or to SQL?
-----Original Message-----
From: Kaufmann M. [mailto:[EMAIL PROTECTED]
Sent: Monday, October 30, 2006 5:11 AM
To: [email protected]
Subject: Re: Storing primary key / Change lucene's document ID
Hello George,
The Problem is the speed, some samples:
All Counts include writing IDs to file and BULK Insert to SQL:
18260 ID's - 72.2 s
352 ID's - 1.08s
96 ID's - 1.05s
29 ID's - 0.66s
3000 ID's - 10.02s
350 ID's - 0.98s
278 ID's - 0.48s
830 ID's - 2.25s
As you can see - the time it takes for Records >500 is absolutely
slow...
If I write back the internal ID - it's a LOT faster...
I'm not using the lucene-ordering because this also slowed down the
returning process a lot.
And I'd like to count the results in different ways (which I was not
able to
do in lucene) so I have to give back all ID's into SQL...
Thanks for helpin'!
On 10/30/06, George Aroush <[EMAIL PROTECTED]> wrote:
>
> Hi Marc,
>
> You can't depend on Lucene's internal ID, it will change every time
when
> you
> update the index -- this is something you can't control. The way you
are
> currently doing it, by storing an ID in a field named "id" is the
right
> way
> to do it. Don't worry about slowing down Lucene if you call the API
to
> get
> the ID of your field "id". Lucene is supper fast.
>
> Regards,
>
> -- George Aroush
>
> -----Original Message-----
> From: Kaufmann M. [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 27, 2006 4:20 PM
> To: [email protected]
> Subject: Storing primary key / Change lucene's document ID
>
> Hello everybody,
> I've got a little question concerning the unique ID stored in the
Lucene
> index (hits.ID(i)).
> Is it possible to change this ID, or set it on doc.add?
>
> Currently I'm running a test-project wich stores an external primary
key
> in
> a field named 'id', but if I call it from the search-engine I have to
use
> the get-method - wich slows it down.
> If I could use this primary key as lucene-ID the whole engine would be
a
> lot
> faster because I just need the ID's returned...
>
> Does anybody know if this is possible?
>
> Thanks!
> Best Regards, Marc
>
>