Is it possible to fix this without removing the regular expressions ? 

Fran

El dj 15 de 04 de 2010 a les 12:46 +0200, en/na Gema Ramírez-Sánchez va
escriure:
> I remember this problem but I couldn't remember what it was. I have
> just asked Sergio and the problem was the regular expression for
> "proper names" in dixes (I think they were experiencing it in en-es)
> 
> Hope it helps,
> 
> Gema.
> 
> On Sun, Apr 11, 2010 at 11:45 PM, Jimmy O'Regan <[email protected]> wrote:
> > On 11 April 2010 22:33, Jacob Nordfalk <[email protected]> wrote:
> >>
> >>
> >> 2010/4/9 Francis Tyers <[email protected]>
> >>>
> >>> El dv 09 de 04 de 2010 a les 22:21 +0200, en/na Maciej Jaskowski va
> >>> escriure:
> >>>
> >>> > Apparently, however, we have somewhat more demanding classes in Poland
> >>> > ;-)
> >>> > We are not only to do profiling of some open-source stuff but also to
> >>> > boost things (in terms of speed or memory), create patch etc.
> >>> >
> >>> > So my question is a little bit different: in which app/module/lib do
> >>> > you think it is most likely some improvements could be made? And
> >>> > well... of course the bigger the possible improvement the better. The
> >>> > more important place, the better too ;-)
> >>
> >> I'll take the freedom to cite myself (again, again :-)  :
> >> Thinking more about this, I'd say I already know what you will find:
> >> 1) Transfer is using 95% of the CPU in the translation process
> >> 1a) interpretation / XML tree walking  is the major culprit, taking about
> >> (say) 60 %
> >> 1b) Repeated uses (of the same) regexps  is also a culprit, taking (say) 25
> >> %
> >> So, this will be the place of improvement - so pre-compile the XML or to
> >> make the XML interpretation faster. Take a look at
> >> f.eks.: Transfer::processOut(xmlNode *localroot) in apertium/transfer.cc:
> >> void
> >> Transfer::processOut(xmlNode *localroot)
> >> {
> >>   for(xmlNode *i = localroot->children; i != NULL; i = i->next)
> >>   {
> >>     if(i->type == XML_ELEMENT_NODE)
> >>     {
> >> Here, 1st optimization could be to pre-process the XML node tree so that 
> >> all
> >> children of other type than XML_ELEMENT_NODE is cut away from the tree in
> >> beforehend. I haven't checked but I suppose all the if-statements are there
> >> to skip comments and whitespace.
> >> Another optimization could be to make faster versions of e.g.
> >> Transfer::processInstruction(xmlNode *localroot)
> >> {
> >>   if(!xmlStrcmp(localroot->name, (const xmlChar *) "choose"))
> >>   {
> >>     processChoose(localroot);
> >>   }
> >>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "let"))
> >>   {
> >>     processLet(localroot);
> >>   }
> >>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "append"))
> >>   {
> >>     processAppend(localroot);
> >>   }
> >>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "out"))
> >>   {
> >>     processOut(localroot);
> >>   }
> >>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "call-macro"))
> >>   {
> >>     processCallMacro(localroot);
> >>   }
> >>   else if(!xmlStrcmp(localroot->name, (const xmlChar *) "modify-case"))
> >>   {
> >>     processModifyCase(localroot);
> >>   }
> >> }
> >> check if strcmp() would be faster than xmlStrcmp(). And - as the XML can be
> >> supposed to validatig against a DTD, make a switch on the beginning letter
> >> (with an extra branch on 'c' for 'choose' and 'call-macro').
> >>
> >>>
> >>> According to Jacob there is some issue with lt-proc when analysing many
> >>> sentences. It would be good to profile this, and find out where it is
> >>> happening and fix it.
> >>
> >> I think Fran means 'transfer'.
> >>
> >
> > Some guys at DCU were having this problem with lt-proc (i.e., not
> > using transfer)
> >
> >> I wrote:
> >> $ time bzcat eowiki.crp.txt.bz2 | apertium eo-en > /dev/null
> >> Seems like there is something in the C++ version that makes it go slower 
> >> and
> >> slower and to an almost complete standstill after approx 15000 lines in the
> >> corpus.
> >> It may now have been clear but it's is apertium-transfer that gets slower
> >> and slower.
> >>
> >
> >
> >
> > --
> > <Leftmost> jimregan, that's because deep inside you, you are evil.
> > <Leftmost> Also not-so-deep inside you.
> >
> > ------------------------------------------------------------------------------
> > Download Intel&#174; Parallel Studio Eval
> > Try the new software tools for yourself. Speed compiling, find bugs
> > proactively, and fine-tune applications for parallel performance.
> > See why Intel Parallel Studio got high marks during beta.
> > http://p.sf.net/sfu/intel-sw-dev
> > _______________________________________________
> > Apertium-stuff mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> 
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff



------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to