I remember this problem but I couldn't remember what it was. I have
just asked Sergio and the problem was the regular expression for
"proper names" in dixes (I think they were experiencing it in en-es)

Hope it helps,

Gema.

On Sun, Apr 11, 2010 at 11:45 PM, Jimmy O'Regan <[email protected]> wrote:
> On 11 April 2010 22:33, Jacob Nordfalk <[email protected]> wrote:
>>
>>
>> 2010/4/9 Francis Tyers <[email protected]>
>>>
>>> El dv 09 de 04 de 2010 a les 22:21 +0200, en/na Maciej Jaskowski va
>>> escriure:
>>>
>>> > Apparently, however, we have somewhat more demanding classes in Poland
>>> > ;-)
>>> > We are not only to do profiling of some open-source stuff but also to
>>> > boost things (in terms of speed or memory), create patch etc.
>>> >
>>> > So my question is a little bit different: in which app/module/lib do
>>> > you think it is most likely some improvements could be made? And
>>> > well... of course the bigger the possible improvement the better. The
>>> > more important place, the better too ;-)
>>
>> I'll take the freedom to cite myself (again, again :-)  :
>> Thinking more about this, I'd say I already know what you will find:
>> 1) Transfer is using 95% of the CPU in the translation process
>> 1a) interpretation / XML tree walking  is the major culprit, taking about
>> (say) 60 %
>> 1b) Repeated uses (of the same) regexps  is also a culprit, taking (say) 25
>> %
>> So, this will be the place of improvement - so pre-compile the XML or to
>> make the XML interpretation faster. Take a look at
>> f.eks.: Transfer::processOut(xmlNode *localroot) in apertium/transfer.cc:
>> void
>> Transfer::processOut(xmlNode *localroot)
>> {
>>   for(xmlNode *i = localroot->children; i != NULL; i = i->next)
>>   {
>>     if(i->type == XML_ELEMENT_NODE)
>>     {
>> Here, 1st optimization could be to pre-process the XML node tree so that all
>> children of other type than XML_ELEMENT_NODE is cut away from the tree in
>> beforehend. I haven't checked but I suppose all the if-statements are there
>> to skip comments and whitespace.
>> Another optimization could be to make faster versions of e.g.
>> Transfer::processInstruction(xmlNode *localroot)
>> {
>>   if(!xmlStrcmp(localroot->name, (const xmlChar *) "choose"))
>>   {
>>     processChoose(localroot);
>>   }
>>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "let"))
>>   {
>>     processLet(localroot);
>>   }
>>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "append"))
>>   {
>>     processAppend(localroot);
>>   }
>>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "out"))
>>   {
>>     processOut(localroot);
>>   }
>>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "call-macro"))
>>   {
>>     processCallMacro(localroot);
>>   }
>>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "modify-case"))
>>   {
>>     processModifyCase(localroot);
>>   }
>> }
>> check if strcmp() would be faster than xmlStrcmp(). And - as the XML can be
>> supposed to validatig against a DTD, make a switch on the beginning letter
>> (with an extra branch on 'c' for 'choose' and 'call-macro').
>>
>>>
>>> According to Jacob there is some issue with lt-proc when analysing many
>>> sentences. It would be good to profile this, and find out where it is
>>> happening and fix it.
>>
>> I think Fran means 'transfer'.
>>
>
> Some guys at DCU were having this problem with lt-proc (i.e., not
> using transfer)
>
>> I wrote:
>> $ time bzcat eowiki.crp.txt.bz2 | apertium eo-en > /dev/null
>> Seems like there is something in the C++ version that makes it go slower and
>> slower and to an almost complete standstill after approx 15000 lines in the
>> corpus.
>> It may now have been clear but it's is apertium-transfer that gets slower
>> and slower.
>>
>
>
>
> --
> <Leftmost> jimregan, that's because deep inside you, you are evil.
> <Leftmost> Also not-so-deep inside you.
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to