Thank you, Suzy!

Running the factored example and looking at the files produced there
does indeed sound like a way to go, if there is no documentation or
clarifications somewhere that could answer my questions.

Best,
Sonja


2010/9/23 Suzy Howlett <[email protected]>:
> Hi Sonja,
>
> I'm afraid I haven't used factored models through the EMS, so I can't answer
> your question directly; perhaps someone with more experience will be able to
> add something. I would suggest running the config.factored example in the
> EMS example directory and see what the intermediate files look like to work
> out at which step you can insert your file.
>
> Looking at the code, it looks like specifying config information under
> [INPUT-FACTOR] and [OUTPUT-FACTOR] will take plain text and generate
> factored data, stored as a file in the "corpus" directory with "factored" in
> the name. If you find this file and it's of the same format as yours, I
> expect you will be able to use your data without any problems. If the file
> is in a different format you will probably have to write a script to convert
> your file to the format you see there.
>
> My best guess is - once your data is in the same format as the
> corpus/factored file - then you just specify your file as the value of
> "factorized-stem" (instead of "raw-stem" or "tokenized-stem" etc.) under
> [CORPUS]. It looks like you'll also have to specify some information like
> "input-factors", "alignment-factors" etc. under [TRAINING].
>
> I imagine creating factored and unfactored models will then proceed the
> same; the only difference being in the factors information you include under
> [TRAINING]. (That is, an unfactored model is a factored model using only a
> factor that maps surface form to surface form.)
>
> Keep in mind I haven't tried any of this, but I hope it helps you get
> started.
>
> Best,
> Suzy
>
> On 23/09/10 1:41 AM, Sonja PETROVIĆ LUNDBERG wrote:
>>
>> Hi!
>>
>> I have factored corpus data in the following Moses format:
>>  86|86|NUM|NUM_P|NPHR|_ .|_|_|_|_|_
>> On|on|PRP|PRP|ADVL>|investigate_application
>> application|application|N|N_S_NOM|P<|S_NOM
>> by|by|PRP|PRP|N<|application_Member=State a|a|ART|ART_S|>N|_
>> Member=State|Member=State|N|N_S_NOM|P<|S_NOM or|or|KC|KC|CO|_
>> on|on|PRP|PRP|ADVL>|investigate_initiative
>> its|it|ADJ|PERS_NEU_3S_GEN|>N|S_NOM own|own|DET|DET|>N|S_NOM
>> initiative|initiative|N|N_S_NOM|P<|S_NOM ,|_|_|_|_|_
>> and|and|KC|KC|CO|_ in|in|PRP|PRP|ADVL>|investigate_co-operation
>> co-operation|co-operation|N|N_S_NOM|P<|S_NOM
>> with|with|PRP|PRP|N<|co-operation_authority the|the|ART|ART|>N|_
>> competent|competent
>> |ADJ|ADJ_POS|>N|P_NOM authorities|authority|N|N_P_NOM|P<|P_NOM
>> in|in|PRP|PRP|N<|authority_Member=States the|the|ART|ART|>N|_
>> Member=States|Member=States|N|N_S_NOM|P<|S_NOM ,|_|_|_|_|_
>> who|who|INDP|INDP|SUBJ>|_ shall|shall|V|V_PR|FS-N<|_
>> give|give|UNKNOWN|UNKNOWN|_|_ it|it|N|PERS_NEU_3S_ACC|<DAT|S_ACC
>> their|they|ADJ|PERS_3P_GEN|>N|S_ACC
>> assistance|assistance|N|N_S_ACC|<ACC|S_ACC ,|_|_|_|_|_
>> the|the|ART|ART|>N|_ Commission|commission|N|N_S_NOM|SUBJ>|S_NOM
>> shall|shall|V|V_PR|FS-STA|_ investigate|investigate|V|V_INF|ICL-AUX<|_
>> cases|case|N|N_P_ACC|<ACC|P_ACC of|of|PRP|PRP|N<|case_infringement
>> suspected|suspected|ADJ|ADJ_POS|>N|S_NOM inf
>> ringement|infringement|N|N_S_NOM|P<|S_NOM
>> of|of|PRP|PRP|N<|infringement_principle these|this|DET|DET_P|>N|P_NOM
>> principles|principle|N|N_P_NOM|P<|P_NOM .|_|_|_|_|_
>>
>> Factors (surface form, lemma, POS, morphology, phrase role, dependency
>> information) are divided by vertical line, words are divided by space
>> and x:th line in one language file is aligned to the x:th line in
>> another language file.
>>
>> Can data in this format be used by EMS? If yes, how and where should I
>> describe my files in config.factored? If no, what should I convert it
>> to?
>>
>> I also wonder how I can make EMS use only the surface form and create
>> an unfactored language and translation model.
>>
>> Regards,
>> Sonja
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Suzy Howlett
> http://www.showlett.id.au/
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to