I think you'd be better off implementing your own
StatefulFeatureFunction, bypassing LanguageModel.{h,cpp} which mostly
handles n-grams crossing phrase boundaries, and calling the
LanguageModelImplementation as the backend. You'll probably want larger
beams too.
Kenneth
On 03/18/11 13:38, Dennis Mehay wrote:
> Hello all,
>
> I am trying to do something rather fancy with Moses by modifying the way
> Moses uses LMs. What I want to do is somewhat akin to the
> "LanguageModelSkip.h" code that is in the repository, in that I want to
> score sequences over only certain factors from the string (to extend the
> reach and, hopefully, the approximation to syntactic or dependency
> LMs). What I have is a way of getting a single label for each entry in
> the phrase table (yes, sounds crazy, but I managed to pull it off). I
> have distributed this label (identically) to each word in the MT phrase,
> and so I want to feed the LM the syntactic label factor of (1) the first
> word in the current phrase and (2) the label factors of the first words
> of the n-1 previous *phrases* (NOT *words*) in the search hypothesis
> that the current phrase is extending. This will essentially tell it the
> syntactic labels of the n phrases that make up the current search
> hypothesis.
>
> This seems like it should be straightforward. I know I'll need to
> override the "Evaluate" and "CalcScore" member functions of the
> LanguageModel.cpp class (they compute the inter-phrase and intra-phrase
> LM scores, right?), but I also see from some comments in the code that I
> shouldn't access previous hypotheses directly from the Evaluate
> function. This apparently will get me in "trouble". Instead, I need to
> pass the n-1 previous phrases into the FFState argument to the Evaluate
> function. (These comments are in a comment from the online code
> documentation -- which isn't in my checked-out repos; could be out of date)
>
> This is similar to what the IRST LM asynchronous LM idea buys you, but
> without limiting what is fed to the LM by a fixed-length *word* window
> (the "<lmmacroSize>" parameter in the IRST LM chunkLM config file). The
> way I plan to implement things, IRST LM and SRILM will both be possible
> LMs to use on the back end -- all of the work will be done by tracking
> what the n-1 previous phrases are in each hypothesis.
>
> My question, then, is (at least) two-fold: (1) Is this the best way to
> go about this (where "this" is my whole crazy idea)? And (2): If so, am
> I right in thinking that (in addition to adding an LM type to the
> LanguageModelFactory class) all I need to to is override the "Evaluate"
> and "CalcScore".
>
> Or am I completely off-base? (Or is this not really even possible at all?)
>
> Any help is much appreciated.
>
> Best,
> D.N.
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support