El dv 02 de 09 de 2011 a les 11:13 +0200, en/na Kevin Brubeck Unhammer
va escriure:
> Kevin Brubeck Unhammer <[email protected]> writes:
> 
> > Francis Tyers <[email protected]> writes:
> >
> >> El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va
> >> escriure:
> >>> On Sunday 28 February 2010, Francis Tyers wrote:
> >>> > > I don't know Icelandic at all and therefore can't tell whether some of
> >>> > > the  words are accepted or rejected incorrectly.
> >>> > 
> >>> > Nice, it looks good. Some of the capitalised words should be recognised
> >>> > corrected, at least 'Bretlandi' and 'Norðmenn' .
> >>> 
> >>> I tried to fix the checking of capitalized words but started to run into 
> >>> problems. It seems that the library API works in somewhat surprising (at 
> >>> least 
> >>> to me) ways when you enter a word that starts with a capital letter and 
> >>> ends 
> >>> with garbage.
> >>> 
> >>> The implementation is here
> >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup
> >>> 
> >>> and test cases here
> >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup
> >>> 
> >>> I was able to get all test cases expect the one with TODO in method name 
> >>> implemented. How would you suggest fixing the code so that all tests 
> >>> would 
> >>> pass? Of course a patch would be most welcome :)
> >>
> >> Hmm, strangely enough, when I try an unknown word I get similar strange
> >> output:
> >>
> >> $ ./test mor.bin 
> >> ^Reykjanghfghesi$ -->
> >> ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$
> >
> > Seems to be a bug with partly-matching regexes in the biltrans
> > functions.
> >
> > Testing the different functions, I get:
> >
> >     biltransWithQueue: 
> > ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$
> >  qSize: 0
> >     biltransWithoutQueue: 
> > ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$
> >     biltrans: 
> > ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$
> >     biltransfull: ^$
> >
> > But, if I comment out the two regex entries
> >
> >     <e>                      <par n="persons"/></e>
> >     <e>                      <par n="organisations"/></e>
> >
> > at the end of apertium-is-en.is.dix, I get
> >
> >     biltransWithQueue: @Reykjanghfghesi qSize: 0
> >     biltransWithoutQueue: @Reykjanghfghesi
> >     biltrans: @Reykjanghfghesi
> >     biltransfull: @Reykjanghfghesi
> >
> > Similarly on the command line with lt-proc -b (while regular lt-proc -a
> > returns unknown, as it should – the persons/orgnisations regexes don't
> > fully match either).
> 
> I put a patch up at
> http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=131 which
> solves this for both lt-proc -b, as well as biltransWithQueue. Please
> test.
> 
> I haven't tried with the other biltrans* functions (I can't see that
> they're actually used in the rest of Apertium, so I'm not sure what
> they're there for).
> 
> It also fixes a problem where superfluous characters after tags would
> pass as matches in lt-proc -b (this bug was not present in
> biltransWithQueue). It's still possible to carry over _tags_ after the
> analysis of course.
> 
> 
> I guess it's not strange that this bug was here, since normally you
> never have words without tags in bidix, but when using these functions
> on a monodix it of course becomes a problem. (And, although it's not
> recommended, if people really do want to have non-tagged lemmas in
> bidix, lttoolbox should at least not give analyses for lemmas that are
> _not_ in the bidix.)
> 
> 
> best regards,
> Kevin Brubeck Unhammer

Looks good to me, and to Jim. We suggest commit and close. I'm going to
do one final test, running a corpus with lt-proc -b before and after the
patch and see if there are any difference. I'll report back soon.

Fran


------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to