Hi,

Right, if the `nbest` tool from CSLM is supposed to work with sparse
features, then it needs to read the names.

An n-best list entry with sparse feature scores may look like this:

0 ||| Orlando Bloom und Miranda Kerr noch lieben  ||| LexicalReordering0= 
-2.29848 0 0 0 -1.93214 0 0 0 
LexicalReordering0_phr-src-last-c200-cluster_162-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_41-0= 1 
LexicalReordering0_stk-tgt-last-c200-cluster_134-0= 1 
LexicalReordering0_phr-src-last-c200-cluster_189-0= 1 
LexicalReordering0_phr-tgt-first-c200-cluster_54-0= 3 
LexicalReordering0_phr-tgt-first-c200-cluster_34-0= 1 
LexicalReordering0_stk-src-first-c200-cluster_59-0= 3 
LexicalReordering0_phr-tgt-first-c200-cluster_134-0= 1 
LexicalReordering0_phr-tgt-last-c200-cluster_54-0= 3 
LexicalReordering0_phr-tgt-last-c200-cluster_34-0= 1 
LexicalReordering0_stk-src-last-c200-cluster_59-0= 3 
LexicalReordering0_phr-src-last-c200-cluster_126-0= 1 
LexicalReordering0_phr-tgt-first-c200-cluster_119-0= 1 
LexicalReordering0_phr-tgt-last-c200-cluster_134-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_59-0= 3 
LexicalReordering0_phr-src-last-c200-cluster_59-0= 3 LexicalReordering0_stk-!
 src-first-c200-cluster_162-0= 1 
LexicalReordering0_stk-src-first-c200-cluster_189-0= 1 
LexicalReordering0_stk-src-last-c200-cluster_162-0= 1 
LexicalReordering0_stk-src-last-c200-cluster_189-0= 1 
LexicalReordering0_stk-tgt-first-c200-cluster_34-0= 1 
LexicalReordering0_stk-tgt-first-c200-cluster_54-0= 3 
LexicalReordering0_phr-tgt-last-c200-cluster_133-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_162-0= 1 
LexicalReordering0_phr-src-first-c200-cluster_189-0= 1 
LexicalReordering0_stk-tgt-first-c200-cluster_134-0= 1 
LexicalReordering0_stk-tgt-last-c200-cluster_34-0= 1 
LexicalReordering0_stk-tgt-last-c200-cluster_54-0= 3 OpSequenceModel0= -31.707 
0 0 0 0 Distortion0= 0 LM0= -36.858 WordPenalty0= -7 PhrasePenalty0= 6 
TranslationModel0= -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 
4.99948 ||| -4.99724

There can be many thousand different sparse features
"LexicalReordering0_*" which fire on one particular set and in
hypotheses which make it to the 100-best list.

The amount of features in different n-best list entries can vary.

It seems to me that the `nbest` tool from CSLM v3 cannot deal with this.
I had a brief look at the code, and I ran: 

$ nbest -i in.100best -o out.100best

(Without specifying any new weights.)

It processes the list but outputs this:

0 |||  Orlando Bloom und Miranda Kerr noch lieben   ||| 0 -2.29848 0 0 0 
-1.93214 0 0 0 0 1 0 1 0 1 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 1 0 1 0 3 0 3 
0 1 0 1 0 1 0 1 0 1 0 3 0 1 0 1 0 1 0 1 0 1 0 3 0 -31.707 0 0 0 0 0 0 0 -36.858 
0 -7 0 6 0 -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 4.99948 ||| 
-4.99724

I think it just takes every token in the scores column and treats it as
a dense score (even including the feature names). Probably nobody
bothered to adapt it to the current format yet.

It would be a minor modification I suppose. The tool just needs to read
and store feature names. Weights would have to be stored by name as
well. They would have to be read from a sparse weights file:

...
LexicalReordering0_btn-src-first-c200-cluster_119-3 0.00840371
LexicalReordering0_btn-src-first-c200-cluster_12-2 0.000442284
LexicalReordering0_btn-src-first-c200-cluster_12-3 0.00182486
LexicalReordering0_btn-src-first-c200-cluster_120-2 5.34991e-06
LexicalReordering0_btn-src-first-c200-cluster_120-3 0.0143345
...

Is CSLM on GitHub? If you don't have a more recent version of the nbest
tool, and nobody else has anything equivalent, then I might take your
code base and just add the few bits that are missing in your tool. It
can be implemented quickly, I'm sure.

I don't want to add any new feature scores using the tool. I only want
to utilize it in order to calculate new overall scores given a weights
file with sparse features, and then to reorder the n-best list entries.
Not a big deal.

Basically, I would think that there should be some functioning tool
readily available for such a seemingly common task. But I'm not aware of
any. Maybe people code a new Perl script for this task on-demand each
time they need it? Or maybe some individual piece of code in the Moses
tuning pipeline does this, and only this?

Cheers,
Matthias




On Fri, 2015-03-27 at 23:48 +0100, Holger Schwenk wrote:
> Hello Matthias,
> 
> could you give us an idea what is missing in the CSLM reranker to make 
> it work for sparse features ?
> 
> Right now, we do not parse the names of the feature functions and store 
> the numerical values only.
> In principle, this could changed ...
> 
> Then it depends how you want to rescore the sparse features.
> The CSLM toolkit can rescore with an back-off LM and Moses on-disk 
> phrase tables (and obviously neural networks).
> 
> Why not adding more functionality ...
> 
> - Holger
> 
> On 03/27/2015 11:42 PM, Matthias Huck wrote:
> > Hi,
> >
> > I'm looking for a tool to rerank n-best lists in Moses' current format,
> > including sparse features. The CSLM toolkit has quite a nice re-ranker
> > implementation, but apparently it doesn't know sparse features yet.
> >
> > If anyone already has an extended version of the existing re-ranker from
> > the CSLM toolkit, or alternatively any other code that does the same and
> > can also deal with sparse features, please let me know. I'd prefer to
> > not spend any time at all on implementing this myself, as I'll probably
> > need to run it only a few times for testing purposes.
> >
> > Cheers,
> > Matthias
> >
> >
> >> On 29 Apr 20:46 2013, Holger Schwenk wrote:
> >>
> >> Hello,
> >>
> >> you can do n-best list rescoring with the nbest tool which is part of
> >> the CSLM toolkit (http://www-lium.univ-lemans.fr/~cslm/)
> >> It is designed to rescore with back-off or continuous space LMs, but is
> >> shouldn't be difficult to add your won feature functions.
> >>
> >> don't ask to contact me if you need help.
> >>
> >> best,
> >>
> >> Holger
> >
> >
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to