El dj 06 de 01 de 2011 a les 08:50 +0100, en/na Kevin Brubeck Unhammer va escriure: > Francis Tyers <[email protected]> writes: > > > El dc 05 de 01 de 2011 a les 09:32 +0100, en/na Kevin Brubeck Unhammer > > va escriure: > >> Hi, > >> > >> Is there a bug in > >> > >> <modify-case> > >> <clip pos="1" side="tl" part="lemh"/> > >> <lit v="aa"/> > >> </modify-case> > >> > >> when the input is all uppercase, or am I using it wrong? > >> > >> > >> wget http://apertium.codepad.org/GdrOe3nL/raw.txt -O problem.t1x > >> wget http://apertium.codepad.org/wo597sse/raw.txt -O problem.dix > >> lt-comp lr problem.dix problem.dix.bin > >> apertium-preprocess-transfer problem.t1x problem.t1x.bin > >> echo '^GUOKTE<Num>$' | apertium-transfer problem.t1x problem.t1x.bin > >> problem.dix.bin > >> > >> > >> gives > >> > >> > >> ^det<det><qnt>{^tO<det><qnt>$}$ > >> > >> > >> whereas I was expecting to see > >> > >> > >> ^det<det><qnt>{^to<det><qnt>$}$ > > > > I think the code that deals with this is in transfer.cc > > > > string > > Transfer::copycase(string const &source_word, string const &target_word) > > > > I'm struggling to make heads or tails of that though. In the en-ca > > rules, you find: > > > > <modify-case> > > <clip pos="1" side="tl" part="lem"/> > > <lit v="aa"/> > > </modify-case> > > > > and in the es-ca rules too. So I guess you are calling it right. > > > > It would seem to be a bug of some description. > > s_word == "aa", t_word == "TO" > then for s_word: firstupper is false, uppercase is false, sizeone is false > > if(!uppercase || (sizeone && uppercase)) > { > result = t_word; > result[0] = towlower(result[0]); > //result = StringUtils::tolower(t_word); > } > else > { > result = StringUtils::toupper(t_word); > } > > if(firstupper) > { > result[0] = towupper(result[0]); > } > > gives us "tO" (first test passes). If we change the first test to > > if(!uppercase || (sizeone && uppercase)) > { > result = t_word; > //result[0] = towlower(result[0]); > result = StringUtils::tolower(t_word); > } > > we get the expected "to". Does anyone know why we would want to only > lowercase the first character? >
No, changing this seems like a fairly non-destructive thing to do. If we had a testing framework we'd know if it broke any previous problems, but we don't, so it's probably best to just run a corpus with the old version. Then make the change, then run the corpus again with the new version to make sure there aren't any unexpected differences. Fran ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
