Hello Jon,
The most difference in time needed I have found was between:
console.writeln(hits.id(i))
and
console.writeln(hits.doc(i).get(fieldName)

If I return the internal ID within this code, it is a lot faster than
returning a field-name trough ...get().

Overview of the current code:
dim qry as search.query=(...)
dim sw as new io.streamwriter(...)
dim hits as search.hits
hits=lis.search(qry) (lis is defined once at the start of code)
console.write(hits.length)
console.write(" writing file ")
dim intposmax as integer=hits.length-1
for intpos as integer=0 to intposmax
 if not intPos=0 then sw.write(",")
 sw.write(hits.doc(intpos).get("id").tostring
next
sw.close
console.write(" - bulk insert ")

... bulk insert from sw.write file

so you can see the time needed from search and bulk insert in the console.
Bulk insert is not as fast on large resultsets, but the search is still
slower - so my primary bottleneck :).

I already did some tests from hits.id(intPos) to hits.doc(intpos).get("id")
- those two had a big difference in time to take...

Best Regards, Marc



On 10/30/06, Jon Palmer <[EMAIL PROTECTED]> wrote:

Marc,



Can you give a few more details of how you are searching lucene. Maybe
some pseudo code of the method that is fast and the one that is slow. I
think you suggesting that there is a very large performance hit for
doing this:



DocID = Hits.Doc(i).Get("ID")



rather than:



DocID = Hits.ID(i)





JP



P.S. Your numbers suggested that your problem is mostly linear. It looks
like you method has some setup cost and then processes approx 300 Id's a
second



18260 ID's - 72.2 s  -avg 253/s

3000 ID's - 10.02s  -avg 294/s

830 ID's - 2.25s  -avg 368/s

352 ID's - 1.08s  -avg 325/s

350 ID's - 0.98s  -avg 357/s

278 ID's - 0.48s  -avg 162/s

96 ID's - 1.05s  -avg 91/s

29 ID's - 0.66s  -avg 43/s



Given this linear-ish behavior are you sure that the bottle neck is not
writing back to file or to SQL?







-----Original Message-----
From: Kaufmann M. [mailto:[EMAIL PROTECTED]
Sent: Monday, October 30, 2006 5:11 AM
To: [email protected]
Subject: Re: Storing primary key / Change lucene's document ID



Hello George,

The Problem is the speed, some samples:



All Counts include writing IDs to file and BULK Insert to SQL:

18260 ID's - 72.2 s

352 ID's - 1.08s

96 ID's - 1.05s

29 ID's - 0.66s

3000 ID's - 10.02s

350 ID's - 0.98s

278 ID's - 0.48s

830 ID's - 2.25s



As you can see - the time it takes for Records >500 is absolutely
slow...

If I write back the internal ID - it's a LOT faster...



I'm not using the lucene-ordering because this also slowed down the

returning process a lot.

And I'd like to count the results in different ways (which I was not
able to

do in lucene) so I have to give back all ID's into SQL...



Thanks for helpin'!





On 10/30/06, George Aroush <[EMAIL PROTECTED]> wrote:

>

> Hi Marc,

>

> You can't depend on Lucene's internal ID, it will change every time
when

> you

> update the index -- this is something you can't control.  The way you
are

> currently doing it, by storing an ID in a field named "id" is the
right

> way

> to do it.  Don't worry about slowing down Lucene if you call the API
to

> get

> the ID of your field "id".  Lucene is supper fast.

>

> Regards,

>

> -- George Aroush

>

> -----Original Message-----

> From: Kaufmann M. [mailto:[EMAIL PROTECTED]

> Sent: Friday, October 27, 2006 4:20 PM

> To: [email protected]

> Subject: Storing primary key / Change lucene's document ID

>

> Hello everybody,

> I've got a little question concerning the unique ID stored in the
Lucene

> index (hits.ID(i)).

> Is it possible to change this ID, or set it on doc.add?

>

> Currently I'm running a test-project wich stores an external primary
key

> in

> a field named 'id', but if I call it from the search-engine I have to
use

> the get-method - wich slows it down.

> If I could use this primary key as lucene-ID the whole engine would be
a

> lot

> faster because I just need the ID's returned...

>

> Does anybody know if this is possible?

>

> Thanks!

> Best Regards, Marc

>

>





Reply via email to