Hi Sonja,

I'm afraid I haven't used factored models through the EMS, so I can't 
answer your question directly; perhaps someone with more experience will 
be able to add something. I would suggest running the config.factored 
example in the EMS example directory and see what the intermediate files 
look like to work out at which step you can insert your file.

Looking at the code, it looks like specifying config information under 
[INPUT-FACTOR] and [OUTPUT-FACTOR] will take plain text and generate 
factored data, stored as a file in the "corpus" directory with 
"factored" in the name. If you find this file and it's of the same 
format as yours, I expect you will be able to use your data without any 
problems. If the file is in a different format you will probably have to 
write a script to convert your file to the format you see there.

My best guess is - once your data is in the same format as the 
corpus/factored file - then you just specify your file as the value of 
"factorized-stem" (instead of "raw-stem" or "tokenized-stem" etc.) under 
[CORPUS]. It looks like you'll also have to specify some information 
like "input-factors", "alignment-factors" etc. under [TRAINING].

I imagine creating factored and unfactored models will then proceed the 
same; the only difference being in the factors information you include 
under [TRAINING]. (That is, an unfactored model is a factored model 
using only a factor that maps surface form to surface form.)

Keep in mind I haven't tried any of this, but I hope it helps you get 
started.

Best,
Suzy

On 23/09/10 1:41 AM, Sonja PETROVIĆ LUNDBERG wrote:
> Hi!
>
> I have factored corpus data in the following Moses format:
>   86|86|NUM|NUM_P|NPHR|_ .|_|_|_|_|_
> On|on|PRP|PRP|ADVL>|investigate_application
> application|application|N|N_S_NOM|P<|S_NOM
> by|by|PRP|PRP|N<|application_Member=State a|a|ART|ART_S|>N|_
> Member=State|Member=State|N|N_S_NOM|P<|S_NOM or|or|KC|KC|CO|_
> on|on|PRP|PRP|ADVL>|investigate_initiative
> its|it|ADJ|PERS_NEU_3S_GEN|>N|S_NOM own|own|DET|DET|>N|S_NOM
> initiative|initiative|N|N_S_NOM|P<|S_NOM ,|_|_|_|_|_
> and|and|KC|KC|CO|_ in|in|PRP|PRP|ADVL>|investigate_co-operation
> co-operation|co-operation|N|N_S_NOM|P<|S_NOM
> with|with|PRP|PRP|N<|co-operation_authority the|the|ART|ART|>N|_
> competent|competent
> |ADJ|ADJ_POS|>N|P_NOM authorities|authority|N|N_P_NOM|P<|P_NOM
> in|in|PRP|PRP|N<|authority_Member=States the|the|ART|ART|>N|_
> Member=States|Member=States|N|N_S_NOM|P<|S_NOM ,|_|_|_|_|_
> who|who|INDP|INDP|SUBJ>|_ shall|shall|V|V_PR|FS-N<|_
> give|give|UNKNOWN|UNKNOWN|_|_ it|it|N|PERS_NEU_3S_ACC|<DAT|S_ACC
> their|they|ADJ|PERS_3P_GEN|>N|S_ACC
> assistance|assistance|N|N_S_ACC|<ACC|S_ACC ,|_|_|_|_|_
> the|the|ART|ART|>N|_ Commission|commission|N|N_S_NOM|SUBJ>|S_NOM
> shall|shall|V|V_PR|FS-STA|_ investigate|investigate|V|V_INF|ICL-AUX<|_
> cases|case|N|N_P_ACC|<ACC|P_ACC of|of|PRP|PRP|N<|case_infringement
> suspected|suspected|ADJ|ADJ_POS|>N|S_NOM inf
> ringement|infringement|N|N_S_NOM|P<|S_NOM
> of|of|PRP|PRP|N<|infringement_principle these|this|DET|DET_P|>N|P_NOM
> principles|principle|N|N_P_NOM|P<|P_NOM .|_|_|_|_|_
>
> Factors (surface form, lemma, POS, morphology, phrase role, dependency
> information) are divided by vertical line, words are divided by space
> and x:th line in one language file is aligned to the x:th line in
> another language file.
>
> Can data in this format be used by EMS? If yes, how and where should I
> describe my files in config.factored? If no, what should I convert it
> to?
>
> I also wonder how I can make EMS use only the surface form and create
> an unfactored language and translation model.
>
> Regards,
> Sonja
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
Suzy Howlett
http://www.showlett.id.au/
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to