Hi all, I managed to extract rules from a parsed parallel corpus (thanks to Hieu!), but some of them contain XML strings that I believe should not be there, for example:
( [pu] ||| <tree [pu] ||| ||| 0.0526316 1 0.025 1 2.718 ||| 19 40 , [pu] ||| </tree> [pu] ||| ||| 0.357143 1 0.0028169 1 2.718 ||| 14 1775 , [pu] ||| <tree [pron-pers] ||| ||| 0.166667 1 0.00056338 1 2.718 ||| 6 1775 </tree> <tree label="np"> <tree [np] ||| a base de cimento [np] ||| ||| 1 1 1 1 2.718 ||| 0.5 0.5 I've checked the XML and it seems to be alright. Is this something expected? Thanks a lot, Lucia _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
