I remember this problem but I couldn't remember what it was. I have just asked Sergio and the problem was the regular expression for "proper names" in dixes (I think they were experiencing it in en-es)
Hope it helps, Gema. On Sun, Apr 11, 2010 at 11:45 PM, Jimmy O'Regan <[email protected]> wrote: > On 11 April 2010 22:33, Jacob Nordfalk <[email protected]> wrote: >> >> >> 2010/4/9 Francis Tyers <[email protected]> >>> >>> El dv 09 de 04 de 2010 a les 22:21 +0200, en/na Maciej Jaskowski va >>> escriure: >>> >>> > Apparently, however, we have somewhat more demanding classes in Poland >>> > ;-) >>> > We are not only to do profiling of some open-source stuff but also to >>> > boost things (in terms of speed or memory), create patch etc. >>> > >>> > So my question is a little bit different: in which app/module/lib do >>> > you think it is most likely some improvements could be made? And >>> > well... of course the bigger the possible improvement the better. The >>> > more important place, the better too ;-) >> >> I'll take the freedom to cite myself (again, again :-) : >> Thinking more about this, I'd say I already know what you will find: >> 1) Transfer is using 95% of the CPU in the translation process >> 1a) interpretation / XML tree walking is the major culprit, taking about >> (say) 60 % >> 1b) Repeated uses (of the same) regexps is also a culprit, taking (say) 25 >> % >> So, this will be the place of improvement - so pre-compile the XML or to >> make the XML interpretation faster. Take a look at >> f.eks.: Transfer::processOut(xmlNode *localroot) in apertium/transfer.cc: >> void >> Transfer::processOut(xmlNode *localroot) >> { >> for(xmlNode *i = localroot->children; i != NULL; i = i->next) >> { >> if(i->type == XML_ELEMENT_NODE) >> { >> Here, 1st optimization could be to pre-process the XML node tree so that all >> children of other type than XML_ELEMENT_NODE is cut away from the tree in >> beforehend. I haven't checked but I suppose all the if-statements are there >> to skip comments and whitespace. >> Another optimization could be to make faster versions of e.g. >> Transfer::processInstruction(xmlNode *localroot) >> { >> if(!xmlStrcmp(localroot->name, (const xmlChar *) "choose")) >> { >> processChoose(localroot); >> } >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "let")) >> { >> processLet(localroot); >> } >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "append")) >> { >> processAppend(localroot); >> } >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "out")) >> { >> processOut(localroot); >> } >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "call-macro")) >> { >> processCallMacro(localroot); >> } >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "modify-case")) >> { >> processModifyCase(localroot); >> } >> } >> check if strcmp() would be faster than xmlStrcmp(). And - as the XML can be >> supposed to validatig against a DTD, make a switch on the beginning letter >> (with an extra branch on 'c' for 'choose' and 'call-macro'). >> >>> >>> According to Jacob there is some issue with lt-proc when analysing many >>> sentences. It would be good to profile this, and find out where it is >>> happening and fix it. >> >> I think Fran means 'transfer'. >> > > Some guys at DCU were having this problem with lt-proc (i.e., not > using transfer) > >> I wrote: >> $ time bzcat eowiki.crp.txt.bz2 | apertium eo-en > /dev/null >> Seems like there is something in the C++ version that makes it go slower and >> slower and to an almost complete standstill after approx 15000 lines in the >> corpus. >> It may now have been clear but it's is apertium-transfer that gets slower >> and slower. >> > > > > -- > <Leftmost> jimregan, that's because deep inside you, you are evil. > <Leftmost> Also not-so-deep inside you. > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
