2011/1/23 Francis Tyers <[email protected]>

> El dg 23 de 01 de 2011 a les 07:28 +0300, en/na Hèctor Alòs i Font va
> escriure:
> > I'm not able to generate a new prob file. I'm being having the same
> > problem when the programme begins to run Kupiec's algorithm (i.e. just
> > after it finishes the generation of the fr.crp file). The error still
> > exist using different training corpus. The dictionary has more than
> > 40,000 entries and the corpora 800-900 M words (with one of them I
> > could generate the current prob file six months ago). Coverage is c.
> > 96%. That's the output I receive:
> >
> > make -f fr-eo-unsupervised.make
> >
> > Generating fr-tagger-data/fr.dic
> > This may take some time. Please, take a cup of coffee and come back
> > later.
> > apertium-validate-dictionary apertium-eo-fr.fr.dix
> > apertium-validate-tagger apertium-eo-fr.fr.tsx
> > lt-expand apertium-eo-fr.fr.dix | grep -v "__REGEXP__" | grep -v ":<:"
> > |\
> >     awk 'BEGIN{FS=":>:|:"}{print $1 ".";}' | apertium-destxt
> > >fr.dic.expanded
> > lt-proc -a fr-eo.automorf.bin <fr.dic.expanded | \
> >     apertium-filter-ambiguity apertium-eo-fr.fr.tsx >
> > fr-tagger-data/fr.dic
> > rm fr.dic.expanded;
> > apertium-destxt < fr-tagger-data/fr.crp.txt | lt-proc
> > fr-eo.automorf.bin > fr-tagger-data/fr.crp
> > apertium-validate-tagger apertium-eo-fr.fr.tsx
> > apertium-tagger -t 8 \
> >                            fr-tagger-data/fr.dic \
> >                            fr-tagger-data/fr.crp \
> >                            apertium-eo-fr.fr.tsx \
> >                            fr-eo.prob;
> > Calculating ambiguity classes...
> >
> > 90 states and 420 ambiguity classes
> > Kupiec's initialization of transition and emission probabilities...
> > make: *** [fr-eo.prob] Error 1
> >
> >
> > Any idea?
> > Thanks in advance.
> > Hèctor
> >
> > PS
> > The fr.crp which is generated at the beginning of the process seems to
> > me very small: just 390 lines. If it should be a list of all ambiguous
> > forms, it should have thousands of them.
>
> Hey hèctor, it could be that that file is the ambiguity class file...
>
> Can you upload the corpus somewhere so that we could download it and
> check it ourselves ?  Alternatively, could you run the training with
> apertium-tagger through gdb to find out exactly where in the code it
> errors.
>
> Fran
>
>
Thanks, Fran. I put a corpus in: http://tinyurl.com/4jkmko3

If there is anywhere an explanation how I could run the tagger through gdb,
I may try.

Hèctor
------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to