Is it possible to fix this without removing the regular expressions ? Fran
El dj 15 de 04 de 2010 a les 12:46 +0200, en/na Gema Ramírez-Sánchez va escriure: > I remember this problem but I couldn't remember what it was. I have > just asked Sergio and the problem was the regular expression for > "proper names" in dixes (I think they were experiencing it in en-es) > > Hope it helps, > > Gema. > > On Sun, Apr 11, 2010 at 11:45 PM, Jimmy O'Regan <[email protected]> wrote: > > On 11 April 2010 22:33, Jacob Nordfalk <[email protected]> wrote: > >> > >> > >> 2010/4/9 Francis Tyers <[email protected]> > >>> > >>> El dv 09 de 04 de 2010 a les 22:21 +0200, en/na Maciej Jaskowski va > >>> escriure: > >>> > >>> > Apparently, however, we have somewhat more demanding classes in Poland > >>> > ;-) > >>> > We are not only to do profiling of some open-source stuff but also to > >>> > boost things (in terms of speed or memory), create patch etc. > >>> > > >>> > So my question is a little bit different: in which app/module/lib do > >>> > you think it is most likely some improvements could be made? And > >>> > well... of course the bigger the possible improvement the better. The > >>> > more important place, the better too ;-) > >> > >> I'll take the freedom to cite myself (again, again :-) : > >> Thinking more about this, I'd say I already know what you will find: > >> 1) Transfer is using 95% of the CPU in the translation process > >> 1a) interpretation / XML tree walking is the major culprit, taking about > >> (say) 60 % > >> 1b) Repeated uses (of the same) regexps is also a culprit, taking (say) 25 > >> % > >> So, this will be the place of improvement - so pre-compile the XML or to > >> make the XML interpretation faster. Take a look at > >> f.eks.: Transfer::processOut(xmlNode *localroot) in apertium/transfer.cc: > >> void > >> Transfer::processOut(xmlNode *localroot) > >> { > >> for(xmlNode *i = localroot->children; i != NULL; i = i->next) > >> { > >> if(i->type == XML_ELEMENT_NODE) > >> { > >> Here, 1st optimization could be to pre-process the XML node tree so that > >> all > >> children of other type than XML_ELEMENT_NODE is cut away from the tree in > >> beforehend. I haven't checked but I suppose all the if-statements are there > >> to skip comments and whitespace. > >> Another optimization could be to make faster versions of e.g. > >> Transfer::processInstruction(xmlNode *localroot) > >> { > >> if(!xmlStrcmp(localroot->name, (const xmlChar *) "choose")) > >> { > >> processChoose(localroot); > >> } > >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "let")) > >> { > >> processLet(localroot); > >> } > >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "append")) > >> { > >> processAppend(localroot); > >> } > >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "out")) > >> { > >> processOut(localroot); > >> } > >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "call-macro")) > >> { > >> processCallMacro(localroot); > >> } > >> else if(!xmlStrcmp(localroot->name, (const xmlChar *) "modify-case")) > >> { > >> processModifyCase(localroot); > >> } > >> } > >> check if strcmp() would be faster than xmlStrcmp(). And - as the XML can be > >> supposed to validatig against a DTD, make a switch on the beginning letter > >> (with an extra branch on 'c' for 'choose' and 'call-macro'). > >> > >>> > >>> According to Jacob there is some issue with lt-proc when analysing many > >>> sentences. It would be good to profile this, and find out where it is > >>> happening and fix it. > >> > >> I think Fran means 'transfer'. > >> > > > > Some guys at DCU were having this problem with lt-proc (i.e., not > > using transfer) > > > >> I wrote: > >> $ time bzcat eowiki.crp.txt.bz2 | apertium eo-en > /dev/null > >> Seems like there is something in the C++ version that makes it go slower > >> and > >> slower and to an almost complete standstill after approx 15000 lines in the > >> corpus. > >> It may now have been clear but it's is apertium-transfer that gets slower > >> and slower. > >> > > > > > > > > -- > > <Leftmost> jimregan, that's because deep inside you, you are evil. > > <Leftmost> Also not-so-deep inside you. > > > > ------------------------------------------------------------------------------ > > Download Intel® Parallel Studio Eval > > Try the new software tools for yourself. Speed compiling, find bugs > > proactively, and fine-tune applications for parallel performance. > > See why Intel Parallel Studio got high marks during beta. > > http://p.sf.net/sfu/intel-sw-dev > > _______________________________________________ > > Apertium-stuff mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
