On 26 October 2012 13:57, Raymond HS <[email protected]> wrote: > Hi Jim, > > For the Antara website, I think most of their stories are not translations > (more like comparable than parallel). But I believe there are some of them > that are direct translations. Actually it will be good if Bitextor can use > some linguistic information (like bilingual dictionary) during the alignment > process. :)
IIRC, Bitextor only uses document structure. If you already have a set of aligned documents, Hunalign can use a dictionary to improve existing sentence alignments, and maligna can additionally create IBM Model 1 models. Finding parallel document pairs in comparable corpora is a less researched problem, but Felipe's doctrans project (http://code.google.com/p/doctrans/) happily does that - you'll need a phrase table from Moses to use it, though. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
