On 7 June 2014 07:08, Darshak Parikh <[email protected]> wrote:
> Hello everyone,
>
> I am trying to do supervised tagger training on en-eo.
> $ make -f en-eo-supervised.make
>
> It was working fine, with the usual errors (punctuation marks and multiwords
> not being recognised, etc.) and me correcting those errors to drive the
> training further.
>
> At one point, around 13%, it freezes. After a few seconds, it aborts with an
> error.
>
> Here are the last five lines of the output:
>
> {PREP}   Word: by -- {PREP}      Word: by
> {ADJ}    Word: gradual -- {ADJ}          Word: gradual
> {NOMSG}          Word: cooling -- {NOMSG,GER}    Word: cooling
> Aborted (core dumped)
> make: *** [en-eo.prob] Error 134

It would be helpful to show what the next word was, as that's
presumably what caused the crash.

In en.tagged I see:
^by/by<pr>$
^gradual/gradual<adj>$
^cooling/cooling<n><sg>$
^<97>/<97><guio>$

while in en.untagged I see:
^by/by<pr>$ ^gradual/gradual<adj>$
^cooling/cool<vblex><ger>/cooling<n><sg>$ ^—/—<guio>$

Which makes it seem to me that the most likely cause of the problem is
the encoding mismatch between the tagged and untagged text (i.e., at
this point the tagger is not seeing a matching entry (because of the
encoding difference) and crashes).

Convert the en.tagged file to UTF-8 and see if the problem persists.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to