On 14 May 2011 01:54, Paulo Schreiner <[email protected]> wrote:
> Em Sáb, 2011-05-14 às 01:01 +0100, Jimmy O'Regan escreveu:
>> On 14 May 2011 00:47, Paulo Schreiner <[email protected]> wrote:
>> > Em Sex, 2011-05-13 às 23:45 +0100, Jimmy O'Regan escreveu:
>> >> On 13 May 2011 22:55, Paulo Schreiner <[email protected]> wrote:
>> >> > Anyone here has some experience with the apertium tagger?
>> >> >
>> >> > I have created (to my best knowledge) all required resources, but got
>> >> > stuck with the following error:
>> >> >
>> >> > apertium-tagger -d -s 0 pt.expand pt.tagged.txt pt.tsx pt.prob pt.tagged
>> >> > pt.tagged.morf
>> >> > Calculating ambiguity classes...
>> >> >
>> >> > 30 states and 31 ambiguity classes
>> >> > Kupiec's initialization of transition and emission probabilities...
>> >> > Initializing transition and emission probabilities from a hand-tagged
>> >> > corpus...
>> >> > {adv}    Word: depois -- {prp,adv}       Word: depois
>> >> > Error: A new ambiguity class was found. I cannot continue.
>> >> > Word 'depois' not found in the dictionary.
>> >> > New ambiguity class: {prp,adv}
>> >> > Take a look at the dictionary, then retrain.
>> >>
>> >> 'depois' needs to be added to the dictionary (as both preposition and
>> >> adverb), to match the corpus. In all likelihood, the word is present
>> >> (otherwise it couldn't have encountered an ambiguity), so you'll
>> >> probably need to look at the commands in the Makefile that are used to
>> >> filter the output of lt-expand - it's discarding too much.
>> >>
>> >
>> > Like this? I sorted the expanded file, seems they are there.
>> >
>> > depois:depois<adv>
>> > Depois:depois<adv>
>> > depois:depois<prp>
>> > Depois:depois<prp>
>> >
>> > Any other idea?
>>
>> No need for another idea, because I'm right :P
>>
>> That's the wrong format. It should match the output of the analyser
>> (i.e., you should have entries like:
>> ^depois/depois<pr>/depois<adv>$
>> instead of what you have).
>>
>
> In trying to change the format, I uncovered another error:
>
> lt-proc pt.automorf.bin pt.tagged.txt
>
> I soon get an std::exception when it encounters the "word"
> www.gpopai.usp.br/pesquisacl
>
> It's in the dictionary as:
> <e><p><l>www.gpopai.usp.br/pesquisacl</l><r>www.gpopai.usp.br/pesquisacl<s 
> n="n"/></r></p></e>
>
> WHat am I doing wrong?

You need to escape '/' because it's a special character. Piping
through apertium-destxt should be enough.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to