Em Sáb, 2011-05-14 às 01:01 +0100, Jimmy O'Regan escreveu:
> On 14 May 2011 00:47, Paulo Schreiner <[email protected]> wrote:
> > Em Sex, 2011-05-13 às 23:45 +0100, Jimmy O'Regan escreveu:
> >> On 13 May 2011 22:55, Paulo Schreiner <[email protected]> wrote:
> >> > Anyone here has some experience with the apertium tagger?
> >> >
> >> > I have created (to my best knowledge) all required resources, but got
> >> > stuck with the following error:
> >> >
> >> > apertium-tagger -d -s 0 pt.expand pt.tagged.txt pt.tsx pt.prob pt.tagged
> >> > pt.tagged.morf
> >> > Calculating ambiguity classes...
> >> >
> >> > 30 states and 31 ambiguity classes
> >> > Kupiec's initialization of transition and emission probabilities...
> >> > Initializing transition and emission probabilities from a hand-tagged
> >> > corpus...
> >> > {adv}    Word: depois -- {prp,adv}       Word: depois
> >> > Error: A new ambiguity class was found. I cannot continue.
> >> > Word 'depois' not found in the dictionary.
> >> > New ambiguity class: {prp,adv}
> >> > Take a look at the dictionary, then retrain.
> >>
> >> 'depois' needs to be added to the dictionary (as both preposition and
> >> adverb), to match the corpus. In all likelihood, the word is present
> >> (otherwise it couldn't have encountered an ambiguity), so you'll
> >> probably need to look at the commands in the Makefile that are used to
> >> filter the output of lt-expand - it's discarding too much.
> >>
> >
> > Like this? I sorted the expanded file, seems they are there.
> >
> > depois:depois<adv>
> > Depois:depois<adv>
> > depois:depois<prp>
> > Depois:depois<prp>
> >
> > Any other idea?
> 
> No need for another idea, because I'm right :P
> 
> That's the wrong format. It should match the output of the analyser
> (i.e., you should have entries like:
> ^depois/depois<pr>/depois<adv>$
> instead of what you have).
> 

In trying to change the format, I uncovered another error:

lt-proc pt.automorf.bin pt.tagged.txt 

I soon get an std::exception when it encounters the "word" 
www.gpopai.usp.br/pesquisacl

It's in the dictionary as:
<e><p><l>www.gpopai.usp.br/pesquisacl</l><r>www.gpopai.usp.br/pesquisacl<s 
n="n"/></r></p></e>

WHat am I doing wrong?


------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to