El dv 02 de 09 de 2011 a les 11:13 +0200, en/na Kevin Brubeck Unhammer va escriure: > Kevin Brubeck Unhammer <[email protected]> writes: > > > Francis Tyers <[email protected]> writes: > > > >> El dg 28 de 02 de 2010 a les 21:40 +0200, en/na Harri Pitkänen va > >> escriure: > >>> On Sunday 28 February 2010, Francis Tyers wrote: > >>> > > I don't know Icelandic at all and therefore can't tell whether some of > >>> > > the words are accepted or rejected incorrectly. > >>> > > >>> > Nice, it looks good. Some of the capitalised words should be recognised > >>> > corrected, at least 'Bretlandi' and 'Norðmenn' . > >>> > >>> I tried to fix the checking of capitalized words but started to run into > >>> problems. It seems that the library API works in somewhat surprising (at > >>> least > >>> to me) ways when you enter a word that starts with a capital letter and > >>> ends > >>> with garbage. > >>> > >>> The implementation is here > >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/src/morphology/LttoolboxAnalyzer.cpp?revision=3182&view=markup > >>> > >>> and test cases here > >>> http://voikko.svn.sourceforge.net/viewvc/voikko/trunk/libvoikko/python/ApertiumIcelandicTest.py?revision=3183&view=markup > >>> > >>> I was able to get all test cases expect the one with TODO in method name > >>> implemented. How would you suggest fixing the code so that all tests > >>> would > >>> pass? Of course a patch would be most welcome :) > >> > >> Hmm, strangely enough, when I try an unknown word I get similar strange > >> output: > >> > >> $ ./test mor.bin > >> ^Reykjanghfghesi$ --> > >> ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$ > > > > Seems to be a bug with partly-matching regexes in the biltrans > > functions. > > > > Testing the different functions, I get: > > > > biltransWithQueue: > > ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$ > > qSize: 0 > > biltransWithoutQueue: > > ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$ > > biltrans: > > ^Reykja<vblex><actv><inf>/Reykja<vblex><actv><pri><p3><pl>/Reykur<n><m><pl><gen><ind>$ > > biltransfull: ^$ > > > > But, if I comment out the two regex entries > > > > <e> <par n="persons"/></e> > > <e> <par n="organisations"/></e> > > > > at the end of apertium-is-en.is.dix, I get > > > > biltransWithQueue: @Reykjanghfghesi qSize: 0 > > biltransWithoutQueue: @Reykjanghfghesi > > biltrans: @Reykjanghfghesi > > biltransfull: @Reykjanghfghesi > > > > Similarly on the command line with lt-proc -b (while regular lt-proc -a > > returns unknown, as it should – the persons/orgnisations regexes don't > > fully match either). > > I put a patch up at > http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=131 which > solves this for both lt-proc -b, as well as biltransWithQueue. Please > test. > > I haven't tried with the other biltrans* functions (I can't see that > they're actually used in the rest of Apertium, so I'm not sure what > they're there for). > > It also fixes a problem where superfluous characters after tags would > pass as matches in lt-proc -b (this bug was not present in > biltransWithQueue). It's still possible to carry over _tags_ after the > analysis of course. > > > I guess it's not strange that this bug was here, since normally you > never have words without tags in bidix, but when using these functions > on a monodix it of course becomes a problem. (And, although it's not > recommended, if people really do want to have non-tagged lemmas in > bidix, lttoolbox should at least not give analyses for lemmas that are > _not_ in the bidix.) > > > best regards, > Kevin Brubeck Unhammer
Looks good to me, and to Jim. We suggest commit and close. I'm going to do one final test, running a corpus with lt-proc -b before and after the patch and see if there are any difference. I'll report back soon. Fran ------------------------------------------------------------------------------ Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free "Love Thy Logs" t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
