Hi Ondrej,
I do not aim at that much at quality improvement, just want to have a 
very flexible data structure. I am planning a series of experiments 
where cruel and twisted things will be done to the source language and 
thought that might be a good way to speed up experimenting without 
having to retrain every time. Your idea should not be hard to implement, 
I will take it into account.
Best,
Marcin

W dniu 24.07.2013 15:59, Ondrej Bojar pisze:
> Hi, Marcin,
>
> this could be quite useful (although we did not see any improvements in our 
> experiments, see our WMT(12,11?) paper Selecting Data in EN->CS Translation).
>
> It can be useful to index a different factor than the factors that one 
> eventually wants to use in translation. So the config should allow to say: of 
> all the factors, I want to index these (e.g. 1, the lemma) in the source side 
> of the parallel corpus, and I want to see these as the source side of the 
> phrases (e.g. 0&2, the form and the tag). It is also a question whether the 
> input sentence should be expected to have the same structure of factors as 
> the training corpus.
>
> Looking forward to whatever you decide to implement.
>
> Cheers, O.
>
> "Hieu Hoang" <[email protected]> wrote:
>
>> yes, copy PhraseDictionaryDynSuffixArray, or indeed, your own
>> PhraseDictionaryCompact.
>>
>> there's some docs on adding feature functions.
>>    http://www.statmt.org/moses/?n=Moses.FeatureFunctions
>>
>> it should be easier than a year ago when you added PhraseDictionaryCompact
>>
>> On 24 July 2013 13:29, Marcin Junczys-Dowmunt <[email protected]> wrote:
>>
>>> Hi list,
>>> I am planning to integrate a dynamic phrase table based on a lucene
>>> index and I am wondering how to approach it. Basically it should have
>>> the same functionality as the Dynamic Suffix Array already present in
>>> Moses. Now I am wondering if for a first working version it would be
>>> enough just to hijack DynSuffixArray.h and implement all the public
>>> functions with CLucene. Any hidden pitfalls there? The index would be
>>> indexed by the source sentence and contain alignment data and target
>>> sentences as fields.
>>> Best,
>>> Marcin
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>> -- 
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to