Hi, Right, if the `nbest` tool from CSLM is supposed to work with sparse features, then it needs to read the names.
An n-best list entry with sparse feature scores may look like this: 0 ||| Orlando Bloom und Miranda Kerr noch lieben ||| LexicalReordering0= -2.29848 0 0 0 -1.93214 0 0 0 LexicalReordering0_phr-src-last-c200-cluster_162-0= 1 LexicalReordering0_phr-src-first-c200-cluster_41-0= 1 LexicalReordering0_stk-tgt-last-c200-cluster_134-0= 1 LexicalReordering0_phr-src-last-c200-cluster_189-0= 1 LexicalReordering0_phr-tgt-first-c200-cluster_54-0= 3 LexicalReordering0_phr-tgt-first-c200-cluster_34-0= 1 LexicalReordering0_stk-src-first-c200-cluster_59-0= 3 LexicalReordering0_phr-tgt-first-c200-cluster_134-0= 1 LexicalReordering0_phr-tgt-last-c200-cluster_54-0= 3 LexicalReordering0_phr-tgt-last-c200-cluster_34-0= 1 LexicalReordering0_stk-src-last-c200-cluster_59-0= 3 LexicalReordering0_phr-src-last-c200-cluster_126-0= 1 LexicalReordering0_phr-tgt-first-c200-cluster_119-0= 1 LexicalReordering0_phr-tgt-last-c200-cluster_134-0= 1 LexicalReordering0_phr-src-first-c200-cluster_59-0= 3 LexicalReordering0_phr-src-last-c200-cluster_59-0= 3 LexicalReordering0_stk-! src-first-c200-cluster_162-0= 1 LexicalReordering0_stk-src-first-c200-cluster_189-0= 1 LexicalReordering0_stk-src-last-c200-cluster_162-0= 1 LexicalReordering0_stk-src-last-c200-cluster_189-0= 1 LexicalReordering0_stk-tgt-first-c200-cluster_34-0= 1 LexicalReordering0_stk-tgt-first-c200-cluster_54-0= 3 LexicalReordering0_phr-tgt-last-c200-cluster_133-0= 1 LexicalReordering0_phr-src-first-c200-cluster_162-0= 1 LexicalReordering0_phr-src-first-c200-cluster_189-0= 1 LexicalReordering0_stk-tgt-first-c200-cluster_134-0= 1 LexicalReordering0_stk-tgt-last-c200-cluster_34-0= 1 LexicalReordering0_stk-tgt-last-c200-cluster_54-0= 3 OpSequenceModel0= -31.707 0 0 0 0 Distortion0= 0 LM0= -36.858 WordPenalty0= -7 PhrasePenalty0= 6 TranslationModel0= -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 4.99948 ||| -4.99724 There can be many thousand different sparse features "LexicalReordering0_*" which fire on one particular set and in hypotheses which make it to the 100-best list. The amount of features in different n-best list entries can vary. It seems to me that the `nbest` tool from CSLM v3 cannot deal with this. I had a brief look at the code, and I ran: $ nbest -i in.100best -o out.100best (Without specifying any new weights.) It processes the list but outputs this: 0 ||| Orlando Bloom und Miranda Kerr noch lieben ||| 0 -2.29848 0 0 0 -1.93214 0 0 0 0 1 0 1 0 1 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 3 0 1 0 1 0 1 0 3 0 3 0 1 0 1 0 1 0 1 0 1 0 3 0 1 0 1 0 1 0 1 0 1 0 3 0 -31.707 0 0 0 0 0 0 0 -36.858 0 -7 0 6 0 -4.56369 -17.4541 -4.49325 -6.47188 0.999896 0 0 0 0 0 4.99948 ||| -4.99724 I think it just takes every token in the scores column and treats it as a dense score (even including the feature names). Probably nobody bothered to adapt it to the current format yet. It would be a minor modification I suppose. The tool just needs to read and store feature names. Weights would have to be stored by name as well. They would have to be read from a sparse weights file: ... LexicalReordering0_btn-src-first-c200-cluster_119-3 0.00840371 LexicalReordering0_btn-src-first-c200-cluster_12-2 0.000442284 LexicalReordering0_btn-src-first-c200-cluster_12-3 0.00182486 LexicalReordering0_btn-src-first-c200-cluster_120-2 5.34991e-06 LexicalReordering0_btn-src-first-c200-cluster_120-3 0.0143345 ... Is CSLM on GitHub? If you don't have a more recent version of the nbest tool, and nobody else has anything equivalent, then I might take your code base and just add the few bits that are missing in your tool. It can be implemented quickly, I'm sure. I don't want to add any new feature scores using the tool. I only want to utilize it in order to calculate new overall scores given a weights file with sparse features, and then to reorder the n-best list entries. Not a big deal. Basically, I would think that there should be some functioning tool readily available for such a seemingly common task. But I'm not aware of any. Maybe people code a new Perl script for this task on-demand each time they need it? Or maybe some individual piece of code in the Moses tuning pipeline does this, and only this? Cheers, Matthias On Fri, 2015-03-27 at 23:48 +0100, Holger Schwenk wrote: > Hello Matthias, > > could you give us an idea what is missing in the CSLM reranker to make > it work for sparse features ? > > Right now, we do not parse the names of the feature functions and store > the numerical values only. > In principle, this could changed ... > > Then it depends how you want to rescore the sparse features. > The CSLM toolkit can rescore with an back-off LM and Moses on-disk > phrase tables (and obviously neural networks). > > Why not adding more functionality ... > > - Holger > > On 03/27/2015 11:42 PM, Matthias Huck wrote: > > Hi, > > > > I'm looking for a tool to rerank n-best lists in Moses' current format, > > including sparse features. The CSLM toolkit has quite a nice re-ranker > > implementation, but apparently it doesn't know sparse features yet. > > > > If anyone already has an extended version of the existing re-ranker from > > the CSLM toolkit, or alternatively any other code that does the same and > > can also deal with sparse features, please let me know. I'd prefer to > > not spend any time at all on implementing this myself, as I'll probably > > need to run it only a few times for testing purposes. > > > > Cheers, > > Matthias > > > > > >> On 29 Apr 20:46 2013, Holger Schwenk wrote: > >> > >> Hello, > >> > >> you can do n-best list rescoring with the nbest tool which is part of > >> the CSLM toolkit (http://www-lium.univ-lemans.fr/~cslm/) > >> It is designed to rescore with back-off or continuous space LMs, but is > >> shouldn't be difficult to add your won feature functions. > >> > >> don't ask to contact me if you need help. > >> > >> best, > >> > >> Holger > > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
