Em Sáb, 2011-05-14 às 01:01 +0100, Jimmy O'Regan escreveu: > On 14 May 2011 00:47, Paulo Schreiner <[email protected]> wrote: > > Em Sex, 2011-05-13 às 23:45 +0100, Jimmy O'Regan escreveu: > >> On 13 May 2011 22:55, Paulo Schreiner <[email protected]> wrote: > >> > Anyone here has some experience with the apertium tagger? > >> > > >> > I have created (to my best knowledge) all required resources, but got > >> > stuck with the following error: > >> > > >> > apertium-tagger -d -s 0 pt.expand pt.tagged.txt pt.tsx pt.prob pt.tagged > >> > pt.tagged.morf > >> > Calculating ambiguity classes... > >> > > >> > 30 states and 31 ambiguity classes > >> > Kupiec's initialization of transition and emission probabilities... > >> > Initializing transition and emission probabilities from a hand-tagged > >> > corpus... > >> > {adv} Word: depois -- {prp,adv} Word: depois > >> > Error: A new ambiguity class was found. I cannot continue. > >> > Word 'depois' not found in the dictionary. > >> > New ambiguity class: {prp,adv} > >> > Take a look at the dictionary, then retrain. > >> > >> 'depois' needs to be added to the dictionary (as both preposition and > >> adverb), to match the corpus. In all likelihood, the word is present > >> (otherwise it couldn't have encountered an ambiguity), so you'll > >> probably need to look at the commands in the Makefile that are used to > >> filter the output of lt-expand - it's discarding too much. > >> > > > > Like this? I sorted the expanded file, seems they are there. > > > > depois:depois<adv> > > Depois:depois<adv> > > depois:depois<prp> > > Depois:depois<prp> > > > > Any other idea? > > No need for another idea, because I'm right :P > > That's the wrong format. It should match the output of the analyser > (i.e., you should have entries like: > ^depois/depois<pr>/depois<adv>$ > instead of what you have). >
In trying to change the format, I uncovered another error: lt-proc pt.automorf.bin pt.tagged.txt I soon get an std::exception when it encounters the "word" www.gpopai.usp.br/pesquisacl It's in the dictionary as: <e><p><l>www.gpopai.usp.br/pesquisacl</l><r>www.gpopai.usp.br/pesquisacl<s n="n"/></r></p></e> WHat am I doing wrong? ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
