The idea of this code was certainly to produce an output of each factor, for each token.

With the functioning of outputFactorOrder[i], it's not a good idea to remove this line : it produces empty factor output for many tokens, and it could produce a bad factors order, due to the lack of one of them for a given token (if this is possible).

Perhaps it would have been better to keep producing a value, but not UNK, standing for "unknown token". Rather, for example, UND, standing for "undefined factor".

A finalized solution would have been to really have all needed factors defined for each token with a properly set value...

Le 15/06/2017 à 15:27, Hieu Hoang a écrit :
thanks. Committed
https://github.com/moses-smt/mosesdecoder/commit/4b0560b5c9bd95d7c55cb0451e8947de0eee1d6d

* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/


On 15 June 2017 at 14:07, Etienne Monneret (LM) <[email protected] <mailto:[email protected]>> wrote:

    Manager.cpp
         OutputSurface(..)

    Replace :

         for (size_t i = 1 ; i < outputFactorOrder.size() ; i++) {
           const Factor *factor = phrase.GetFactor(pos, outputFactorOrder[i]);
           if (factor) out << fd << *factor;
           else        out << fd << UNKNOWN_FACTOR;
         }

    By :

         for (size_t i = 1 ; i < outputFactorOrder.size() ; i++) {
           const Factor *factor = phrase.GetFactor(pos, outputFactorOrder[i]);
           if (factor) out << fd << *factor;
    //      else        out << fd << UNKNOWN_FACTOR;
         }



    Best regards,

    Etienne


    Le 07/06/2017 à 17:23, Hieu Hoang a écrit :
    there's probably a bug somewhere in the server code

    * Looking for MT/NLP opportunities *
    Hieu Hoang
    http://moses-smt.org/


    On 7 June 2017 at 11:02, Etienne Monneret (LM)
    <[email protected] <mailto:[email protected]>> wrote:

        Hi !

        I just re-compiled a Moses server.

        Now, the "report-all-factors" option is marking all words as UNK:

        taking|UNK|UNK|UNK advantage|UNK|UNK|UNK of|UNK|UNK|UNK
        the|UNK|UNK|UNK
        |0-1| mushrooms|UNK|UNK|UNK |2-2| relaxante|UNK|UNK|UNK |6-6|
        is|UNK|UNK|UNK an|UNK|UNK|UNK activity|UNK|UNK|UNK |3-5|
        .|UNK|UNK|UNK |7-7|

        This is only in the XML-RPC reply, because, in the
        mosesserver LOG, I
        really get the good words marked as UNK:

        [moses/server/TranslationRequest.cpp:472] BEST TRANSLATION:
        taking
        advantage of the mushrooms relaxante|UNK|UNK|UNK is an activity .
        [11111111]  [total=-108.208]
        
core=(-100.000,-10.000,5.000,-6.851,-16.168,-4.675,-21.479,-8.000,-60.608)

        Is there a new way to do this ?

        Best regards,

        Etienne


        _______________________________________________
        Moses-support mailing list
        [email protected] <mailto:[email protected]>
        http://mailman.mit.edu/mailman/listinfo/moses-support
        <http://mailman.mit.edu/mailman/listinfo/moses-support>




    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support
    <http://mailman.mit.edu/mailman/listinfo/moses-support>



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to