priv.onet.pl)

Szakáts Viktor Tue, 04 Nov 2008 13:27:14 -0800

Hi Przemek,

I've tested the hash version, and it seems equivalent
to the 1D array version in terms of disk size (in fact
it's _exactly_ the same size). The load time turned out
to be ~3 times (9.99s) the native 1D array load routine
(currently in SVN), and 2.5 times the 1D array loading
using deserialize.


Yes it's expected because during deserialization hash array
is build from scratch so it's sorted again.

Dunno if hash loading could be further optimized in any
ways.


I can improve the speed by reading hash items in raw form
but it may cause some bad side effects because hashes uses
national collating by default. IMHO it's wrong and I would
like to change it. The default should be binary collating
we optimized for speed. If user wants then he can enable


I agree completely.

national sorting order by hb_hSetBinary( hValue, .F. ) or
hb_hSetCaseMatch( hValue, .F. ). Simple binary sorting
should give noticeable speed improvement. But in such case
I can try to trust that the order read from serialized data
is correct and eliminate hash array resorting.


That's exactly what I had in mind :)

As I can see it will be good to add support for storing hash
array flags in serialized data and hash default value if it's
set. I think that it will be also good to introduce internal
hash flag which will force resorting on 1-st access and
implement internal b-tree structure for hashes. It's waiting
for really long time. I'll try to make these modifications in
some spare time. Meanwhile I can hard code support for binary
sorted hashes to serialize procedure. It should give noticable
performance improvement so the results will be comparable to
current __I18N_LOADFROMMEMORY() function.
BTW I do not see in current __I18N_*() functions sth for binary
array sorting so they will not work correctly with national
characters if you will try to sort the array at Harbour level.


Unless I don't use hb_setcodepage() I guess. For me it
gave proper results, but I indeed didn't try with extra
collations, which could really be a problem.

I did some more speed comparisons with __i18n_gettext()
vs. direct hash lookup, and hash lookup turned out to be
30% quicker. This is the most speed sensitive function,
so these results are very promising.

If you can implement the raw hash deserialization - besides
being great stuff by itself -, I think we will be able
to simply drop the current special i18n functions, since
we have everything we need using existing Harbour elements,
with similar or possibly better overall performance. The
only remaining part will be the "high-level" API, possibly
in .c, as we've discussed.

Brgds,
Viktor

_______________________________________________
Harbour mailing list
[email protected]
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] 2008-11-01 21:13 UTC+0100 Przemyslaw Czerpak (druzus/at/priv.onet.pl)

Reply via email to