"Jimmy O'Regan" <jore...@gmail.com>
writes:

> On 15 December 2011 10:13, Francis Tyers <fty...@prompsit.com> wrote:
>> El dj 15 de 12 de 2011 a les 10:42 +0100, en/na Kevin Brubeck Unhammer
>> va escriure:
>>> "Jimmy O'Regan" <jore...@gmail.com>
>>> writes:
>>>
>>> > On 14 December 2011 20:19, Pim Otte <otte....@gmail.com> wrote:
>>> >> I'm not sure how i should get the output of the analyser.
>>> >>
>>> >> but running the makefile itself results in an empty af-tagger-data/af.dic
>>> >>
>>> >> running this line: after creating af.dic.expand gives usage on lt-proc
>>> >> usage "lt-proc -e -w -a af-nl.automorf.bin < af.dic.expanded"
>>> >>
>>> >
>>> > Well, there's your problem. Usage prints to stderr, hence empty file.
>>> >
>>> >> Any pointers?
>>> >
>>> > -a is the mode switch, it should be the first option. -w is completely
>>> > superfluous for tagger training, get rid of it. If you want to train a
>>> > tagger that's aware of pin-the-tail-on-the-compound mode, you'll
>>> > probably have to do something extra, because (IIRC) it's only invoked
>>> > when it encounters words that are not in the dictionary, which will
>>> > never be the case on an expansion of the dictionary - so either
>>> > manually add a bunch of compounds, or get rid of that, too.
>>>
>>> -e is the compound thing, -w just ensures lemmas don't get surface case
>>> applied (I guess that's pointless too though?)
>>
>> Do you think the error might be because it finds a word which has a
>> compound analysis, but that isn't in the dictionary ?
>
> That shouldn't happen, because the input is the expansion of the
> dictionary. If it is the case, it's most likely that the filtering of
> the expansion is faulty.
>
> But that's beside the point, the problem is that the options, as
> specified, are triggering the usage information. This could be because
> 1) -a needs to be first; or 2) some conflict among -a, -w, -e. If it's
> a conflict between -a and -w and/or -e, then that's a bug in the
> option handling in lt-proc, and someone who cares about -w and -e
> should fix it (i.e., it ain't gonna be me).
>
> If it's 2), my point is that the bug can be worked around by simply
> omitting -w and -e, because they do nothing -- or omit -a, because
> it's the default mode. Whatever works. I'm sure that -w does nothing
> in this context, but I'm not entirely sure about -e - my recollection
> is that it is engaged if and only if there is no dictionary analysis
> of the word, which, see above, should not happen. I don't know, I've
> never used it.
>
> Leading from that, if you want to train the tagger to have some
> awareness of these guesses at compounds, then the tagger dictionary
> will need to contain material other than the expansion of the
> dictionary.
>
> $ echo foo |lt-proc -w -a   en-es.automorf.bin
> ^foo/*foo$
>
> It's not 1)
>
> $ echo foo |lt-proc -e -a   en-es.automorf.bin
> lt-proc: process a stream with a letter transducer
> [SNIP]
>
> It's a conflict between -e and -a.

I think that was because -e can be seen as a replacement for -a (another
"main mode", and it doesn't make sense to use it with -b nor -g), so I'd
say it's a not-a-bug.


-- 
Kevin Brubeck Unhammer


------------------------------------------------------------------------------
10 Tips for Better Server Consolidation
Server virtualization is being driven by many needs.  
But none more important than the need to reduce IT complexity 
while improving strategic productivity.  Learn More! 
http://www.accelacomm.com/jaw/sdnl/114/51507609/
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to