priv.onet.pl)

Szakáts Viktor Tue, 04 Nov 2008 06:19:18 -0800

Hi Przemek,

I've tested the hash version, and it seems equivalent
to the 1D array version in terms of disk size (in fact
it's _exactly_ the same size). The load time turned out
to be ~3 times (9.99s) the native 1D array load routine
(currently in SVN), and 2.5 times the 1D array loading
using deserialize.


Dunno if hash loading could be further optimized in any
ways.

Brgds,
Viktor

On 2008.11.02., at 0:25, Przemyslaw Czerpak wrote:

On Sat, 01 Nov 2008, Szak�ts Viktor wrote:

Hi Viktor,

The other issue is that 2 dimensional array. Currently
I'm changing it to 1 dimensional on load, and lookup
is faster this way, and the whole translation array
takes less space. Maybe it doesn't matter, I don't know.


With hash array the search can be done without function
call by simple


BTW, I've now tested saved size with serialize as is
(2 dimensional), and it's almost exactly the same as
current __i18n_save(). This is pretty good. Then I've
tested loading speed, and it's only the half for
deserialize.
_i18n_loadfrommemory() [1D]: 3.6s
hb_deserialize() [2D]: 6.7s


Thanks for the tests.
The deserialization code makes always two passes.
1-st one is for checking if passed data can be deserialized
without any errors so some operations are doubled.
If we can "trust" that the data is valid then this pass can
be eliminated though I do not think it's good idea. Such
decoding can be done once and the overhead is rather small
so I prefer to validate the whole operation to avoid serious
problems like GPF or out of memory message when corrupted data
is passed.
The serialization code has also protection and support for
arrays and hashes with cyclic references. So it will not
GPF if you pass to serialize code array with cyclic reference.
It will be stored correctly and the cyclic references will
be restored correctly during deserialization. This protection
also cost a little bit. In summary it creates the speed difference.

1000 iterations with strings preloaded from disk,
for 9200 string pairs (682KB file for both) on a P4HT 2.6.
I've repeated the tests by using flattened 1 dimensional
array, which made the serialized files smaller than i18n
functions, and it also made the loading as below:
hb_deserialize() [1D]: 3.95s
Which is pretty good, so the most optimal would be to use
hb_serialize/deserialize with a flat array, flattened
on save. (saving is not speed or memory critical)
I didn't explore hash, as I have zero experience with them.


The most important is the performance of accessing I18N data
at runtime. Hash arrays allows to access translated strings
without function call. They are also very easy to manage at
.prg level. F.e. this is customized version of hbi18n.c
(without RT errors) which can be written by any user if
we will use standard serialization code and hashes.

  FUNC __I18N_SAVE( cFile, hTrans )
     return hb_memoWrit( cFile, hb_serialize( hTrans ) )

  FUNC __I18N_LOAD( cFile )
     return hb_deserialize( hb_memoRead( cFile ) )

  FUNC __I18N_LOADFROMMEMORY( cData )
     return hb_deserialize( cData )

  FUNC __I18N_GETTEXT( cText, hTrans )
     if cText $ hTrans
        cText := hTrans[ cText ]
     endif
     return cText

  PROC __I18N_ADDTRANS( hTrans, cText, cTrans )
     hTrans[ cText ] := cTrans

  FUNC __I18N_INITTRANS()
     return hb_hSetAutoAdd( { => } )

Isn't simple?
For me such flexibility is very important.

Do you want to add domain support? Let's make changing few twofunctions:


  FUNC __I18N_GETTEXT( cText, hTrans, cDomain )
     local hDomain
     if cDomain == NIL
        cDomain := "[MAIN]"
     endif
     if cDomain $ hTrans
        hDomain := hTrans[ cDomain ]
        if cText $ hDomain
           cText := hDomain[ cText ]
        endif
     endif
     return cText

  PROC __I18N_ADDTRANS( hTrans, cText, cTrans, cDomain )
     if cDomain == NIL
        cDomain := "[MAIN]"
     endif
     if !cDomain $ hTrans
        hTrans[ cDomain ] := hb_hSetAutoAdd( { => } )
     endif
     hTrans[ cDomain, cText ] := cTrans


And now we only have to create new function __I18N_LOADPOT( cFile )
which will make something like:

  FUNC __I18N_LOADPOT( cFile )
     LOCAL hTrans := __I18N_INITTRANS()
     LOCAL cLine, cDomain, cText, cTrans

     FOR EACH cLine in hb_aTokens( memoread( cFile ), hb_osNewLine() )
        IF cLine = "msgctxt "
           cDomain := substr( cLine, 10, len( cLine ) - 10 )
        ELSEIF cLine = "msgid "
           cText := substr( cLine, 8, len( cLine ) - 8 )
        ELSEIF cLine = "msgstr "
           cTrans := substr( cLine, 9, len( cLine ) - 9 )
           IF !EMPTY( cText )
              IF EMPTY( cTrans )
                 cTrans := cText
              ENDIF
              __I18N_ADDTRANS( hTrans, cText, cTrans, cDomain )
              cText := cDomain := NIL
           ENDIF
        ENDIF
     NEXT
     RETURN hTrans

It's simplified version with only very basic functionality and without
error reporting for wrong .pot files. Nothing above is tested (just

written by finger) but should work. Of course .prg version is slowerthen

C code but it can be very easy written also in C. Important is easy to
manage format.
Of course it will be a little bit slower then dedicated format with
code optimized for it (BTW your code seems to be optimal, very nice
job) but the speed difference should not be noticeable in normal
applications.

best regards,
Przemek
_______________________________________________
Harbour mailing list
[email protected]
http://lists.harbour-project.org/mailman/listinfo/harbour


_______________________________________________
Harbour mailing list
[email protected]
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] 2008-11-01 21:13 UTC+0100 Przemyslaw Czerpak (druzus/at/priv.onet.pl)

Reply via email to