Hi,
On Mon, Sep 26, 2011 at 5:40 PM, marco turchi <[email protected]> wrote:
> corpus-coverage-summary and ttable-coverage-summary:
> what does each column represent?
- n-gram order
- number of occurrences in corpus/t-table
- distinct number of phrases in test set with this number of
occurrences ("type")
- total number of phrases in test set with this number of occurrences ("token")
For the low occurrence counts, this is reported on the web page on the top.
> ttable-coverage-by-phrase:
> I suppose that the second column is the number of source phrases in the tt
> table where that particular phrase appears, but what is it the third column?
> is the translation entropy?
Yes, translation entropy based on normalized forward phrase
translation probability.
> input-annotation:
> which information is reported after each sentence?
For each span over the input sentence:
- span range
- count in corpus
- count in ttable (number of distinct translations)
- translation table entropy
This is the basis of the colorful visualization over the input
sentence on the web page.
-phi
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support