Thank you, Suzy! Running the factored example and looking at the files produced there does indeed sound like a way to go, if there is no documentation or clarifications somewhere that could answer my questions.
Best, Sonja 2010/9/23 Suzy Howlett <[email protected]>: > Hi Sonja, > > I'm afraid I haven't used factored models through the EMS, so I can't answer > your question directly; perhaps someone with more experience will be able to > add something. I would suggest running the config.factored example in the > EMS example directory and see what the intermediate files look like to work > out at which step you can insert your file. > > Looking at the code, it looks like specifying config information under > [INPUT-FACTOR] and [OUTPUT-FACTOR] will take plain text and generate > factored data, stored as a file in the "corpus" directory with "factored" in > the name. If you find this file and it's of the same format as yours, I > expect you will be able to use your data without any problems. If the file > is in a different format you will probably have to write a script to convert > your file to the format you see there. > > My best guess is - once your data is in the same format as the > corpus/factored file - then you just specify your file as the value of > "factorized-stem" (instead of "raw-stem" or "tokenized-stem" etc.) under > [CORPUS]. It looks like you'll also have to specify some information like > "input-factors", "alignment-factors" etc. under [TRAINING]. > > I imagine creating factored and unfactored models will then proceed the > same; the only difference being in the factors information you include under > [TRAINING]. (That is, an unfactored model is a factored model using only a > factor that maps surface form to surface form.) > > Keep in mind I haven't tried any of this, but I hope it helps you get > started. > > Best, > Suzy > > On 23/09/10 1:41 AM, Sonja PETROVIĆ LUNDBERG wrote: >> >> Hi! >> >> I have factored corpus data in the following Moses format: >> 86|86|NUM|NUM_P|NPHR|_ .|_|_|_|_|_ >> On|on|PRP|PRP|ADVL>|investigate_application >> application|application|N|N_S_NOM|P<|S_NOM >> by|by|PRP|PRP|N<|application_Member=State a|a|ART|ART_S|>N|_ >> Member=State|Member=State|N|N_S_NOM|P<|S_NOM or|or|KC|KC|CO|_ >> on|on|PRP|PRP|ADVL>|investigate_initiative >> its|it|ADJ|PERS_NEU_3S_GEN|>N|S_NOM own|own|DET|DET|>N|S_NOM >> initiative|initiative|N|N_S_NOM|P<|S_NOM ,|_|_|_|_|_ >> and|and|KC|KC|CO|_ in|in|PRP|PRP|ADVL>|investigate_co-operation >> co-operation|co-operation|N|N_S_NOM|P<|S_NOM >> with|with|PRP|PRP|N<|co-operation_authority the|the|ART|ART|>N|_ >> competent|competent >> |ADJ|ADJ_POS|>N|P_NOM authorities|authority|N|N_P_NOM|P<|P_NOM >> in|in|PRP|PRP|N<|authority_Member=States the|the|ART|ART|>N|_ >> Member=States|Member=States|N|N_S_NOM|P<|S_NOM ,|_|_|_|_|_ >> who|who|INDP|INDP|SUBJ>|_ shall|shall|V|V_PR|FS-N<|_ >> give|give|UNKNOWN|UNKNOWN|_|_ it|it|N|PERS_NEU_3S_ACC|<DAT|S_ACC >> their|they|ADJ|PERS_3P_GEN|>N|S_ACC >> assistance|assistance|N|N_S_ACC|<ACC|S_ACC ,|_|_|_|_|_ >> the|the|ART|ART|>N|_ Commission|commission|N|N_S_NOM|SUBJ>|S_NOM >> shall|shall|V|V_PR|FS-STA|_ investigate|investigate|V|V_INF|ICL-AUX<|_ >> cases|case|N|N_P_ACC|<ACC|P_ACC of|of|PRP|PRP|N<|case_infringement >> suspected|suspected|ADJ|ADJ_POS|>N|S_NOM inf >> ringement|infringement|N|N_S_NOM|P<|S_NOM >> of|of|PRP|PRP|N<|infringement_principle these|this|DET|DET_P|>N|P_NOM >> principles|principle|N|N_P_NOM|P<|P_NOM .|_|_|_|_|_ >> >> Factors (surface form, lemma, POS, morphology, phrase role, dependency >> information) are divided by vertical line, words are divided by space >> and x:th line in one language file is aligned to the x:th line in >> another language file. >> >> Can data in this format be used by EMS? If yes, how and where should I >> describe my files in config.factored? If no, what should I convert it >> to? >> >> I also wonder how I can make EMS use only the surface form and create >> an unfactored language and translation model. >> >> Regards, >> Sonja >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > > -- > Suzy Howlett > http://www.showlett.id.au/ > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
