On 14 May 2011 01:54, Paulo Schreiner <[email protected]> wrote: > Em Sáb, 2011-05-14 às 01:01 +0100, Jimmy O'Regan escreveu: >> On 14 May 2011 00:47, Paulo Schreiner <[email protected]> wrote: >> > Em Sex, 2011-05-13 às 23:45 +0100, Jimmy O'Regan escreveu: >> >> On 13 May 2011 22:55, Paulo Schreiner <[email protected]> wrote: >> >> > Anyone here has some experience with the apertium tagger? >> >> > >> >> > I have created (to my best knowledge) all required resources, but got >> >> > stuck with the following error: >> >> > >> >> > apertium-tagger -d -s 0 pt.expand pt.tagged.txt pt.tsx pt.prob pt.tagged >> >> > pt.tagged.morf >> >> > Calculating ambiguity classes... >> >> > >> >> > 30 states and 31 ambiguity classes >> >> > Kupiec's initialization of transition and emission probabilities... >> >> > Initializing transition and emission probabilities from a hand-tagged >> >> > corpus... >> >> > {adv} Word: depois -- {prp,adv} Word: depois >> >> > Error: A new ambiguity class was found. I cannot continue. >> >> > Word 'depois' not found in the dictionary. >> >> > New ambiguity class: {prp,adv} >> >> > Take a look at the dictionary, then retrain. >> >> >> >> 'depois' needs to be added to the dictionary (as both preposition and >> >> adverb), to match the corpus. In all likelihood, the word is present >> >> (otherwise it couldn't have encountered an ambiguity), so you'll >> >> probably need to look at the commands in the Makefile that are used to >> >> filter the output of lt-expand - it's discarding too much. >> >> >> > >> > Like this? I sorted the expanded file, seems they are there. >> > >> > depois:depois<adv> >> > Depois:depois<adv> >> > depois:depois<prp> >> > Depois:depois<prp> >> > >> > Any other idea? >> >> No need for another idea, because I'm right :P >> >> That's the wrong format. It should match the output of the analyser >> (i.e., you should have entries like: >> ^depois/depois<pr>/depois<adv>$ >> instead of what you have). >> > > In trying to change the format, I uncovered another error: > > lt-proc pt.automorf.bin pt.tagged.txt > > I soon get an std::exception when it encounters the "word" > www.gpopai.usp.br/pesquisacl > > It's in the dictionary as: > <e><p><l>www.gpopai.usp.br/pesquisacl</l><r>www.gpopai.usp.br/pesquisacl<s > n="n"/></r></p></e> > > WHat am I doing wrong?
You need to escape '/' because it's a special character. Piping through apertium-destxt should be enough. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
