El dj 06 de 01 de 2011 a les 08:50 +0100, en/na Kevin Brubeck Unhammer
va escriure:
> Francis Tyers <[email protected]> writes:
> 
> > El dc 05 de 01 de 2011 a les 09:32 +0100, en/na Kevin Brubeck Unhammer
> > va escriure:
> >> Hi,
> >> 
> >> Is there a bug in 
> >> 
> >>         <modify-case>
> >>           <clip pos="1" side="tl" part="lemh"/>
> >>           <lit v="aa"/>
> >>         </modify-case>
> >> 
> >> when the input is all uppercase, or am I using it wrong?
> >> 
> >> 
> >>     wget http://apertium.codepad.org/GdrOe3nL/raw.txt -O problem.t1x
> >>     wget http://apertium.codepad.org/wo597sse/raw.txt -O problem.dix
> >>     lt-comp lr problem.dix problem.dix.bin
> >>     apertium-preprocess-transfer problem.t1x problem.t1x.bin
> >>     echo '^GUOKTE<Num>$' | apertium-transfer problem.t1x problem.t1x.bin 
> >> problem.dix.bin 
> >> 
> >> 
> >> gives
> >> 
> >> 
> >> ^det<det><qnt>{^tO<det><qnt>$}$
> >> 
> >> 
> >> whereas I was expecting to see
> >> 
> >> 
> >> ^det<det><qnt>{^to<det><qnt>$}$
> >
> > I think the code that deals with this is in transfer.cc 
> >
> > string
> > Transfer::copycase(string const &source_word, string const &target_word)
> >
> > I'm struggling to make heads or tails of that though. In the en-ca
> > rules, you find:
> >
> >               <modify-case>
> >                 <clip pos="1" side="tl" part="lem"/>
> >                 <lit v="aa"/>
> >               </modify-case>
> >
> > and in the es-ca rules too. So I guess you are calling it right.
> >
> > It would seem to be a bug of some description.
> 
> s_word == "aa", t_word == "TO"
> then for s_word: firstupper is false, uppercase is false, sizeone is false
> 
>   if(!uppercase || (sizeone && uppercase))
>   {
>     result = t_word;
>     result[0] = towlower(result[0]);
>     //result = StringUtils::tolower(t_word);
>   }
>   else
>   {
>     result = StringUtils::toupper(t_word);
>   }
>   
>   if(firstupper)
>   {
>     result[0] = towupper(result[0]);
>   }
> 
> gives us "tO" (first test passes). If we change the first test to 
> 
>   if(!uppercase || (sizeone && uppercase))
>   {
>     result = t_word;
>     //result[0] = towlower(result[0]);
>     result = StringUtils::tolower(t_word);
>   }
> 
> we get the expected "to". Does anyone know why we would want to only
> lowercase the first character? 
> 

No, changing this seems like a fairly non-destructive thing to do. If we
had a testing framework we'd know if it broke any previous problems, but
we don't, so it's probably best to just run a corpus with the old
version. Then make the change, then run the corpus again with the new
version to make sure there aren't any unexpected differences.

Fran


------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to