Hi Tom,
you are right, you should have the same problem when you decode the same
data. In my case, the vertical bar is the main issue (for the moment).
Would be possible to share your list of characters?

Thanks a lot and good luck!
Marco




On Tue, Aug 30, 2011 at 1:47 PM, Tom Hoar <
[email protected]> wrote:

> Thanks Marco,
>
> I considered that, but it doesn't explain why resuming on the same file is
> successful without any changes. Time and data volume seem to be the culprits
> here, and our work-around to auto-resume seems to be holding for now.
>
> I'm aware of the reserved characters. Our pre-processing tool chain escapes
> 33 non-printing ASCII/ANSI control characters, plus the vertical bar, plus 5
> reserved XML characters. Nonetheless, we had two files last night that
> caused a broken pipe (not a symptom on the time/volume problem above).
> Resuming after these interruptions was not successful and we skipped the
> files. These two files seem to contain non-printing character, despite our
> best efforts to escape everything. I suspect it's a non-printing UTF-8
> control character, such as en space (U+2002 ISOpub), em space (U+2003
> ISOpub), thin space (U+2009 ISOpub), etc. Again, more to do next week.
>
> Tom
>
>
>
> On Mon, 29 Aug 2011 14:16:02 +0200, marco turchi <[email protected]>
> wrote:
>
> Hi Tom,
> I'm running something similar to your wrapper in Java with a 16 core
> (thanks to hyperthreading)  machines, and a common problem that I had at the
> beginning was the presence of the "|" characters in the source sentence.
>
> Cheers
> Marco
>
> On Mon, Aug 29, 2011 at 1:58 PM, Barry Haddow <[email protected]>wrote:
>
>> Hi Tom
>>
>> If one of the moses caches was filling up, then I would expect that the
>> process memory would increase, until the machine ground to a halt. The
>> problem that Ivan had with the original version of his wrapper was
>> slightly
>> different, there was a fixed size i/o buffer that he wasn't emptying,
>> which
>> eventually deadlocked his process.
>>
>> The lm cache that you mentioned below is, as far as I'm aware, specific to
>> irstlm, so if you're not using irstlm, then the flag should have no
>> effect.
>>
>> The translation option cache mainly helps by caching translation options
>> for
>> common phrases like 'the' or '.'. It  is implemented as an LRU cache, and
>> the
>> decoder removes an entry when the cache reaches maximum size. I don't
>> understand the quote from the manual about making sure this cache is
>> frequently cleared - could you tell me where this quote comes from? Tuning
>> the size of the translation options cache may help with performance, but
>> it's
>> unlikely to be the cause of the unexplained crashes.
>>
>> We frequently run multi-threaded decodes on mult-core machines and haven't
>> witnessed any unexplained crashes. So I would quite like to eliminate the
>> python/moses interaction as cause of error. Is it possible to run a
>> similar
>> experiment without the python wrapper, say by just passing moses your
>> source
>> sentences in a file? If it is moses that is crashing, then if you could
>> allow
>> it to generate a core file and make it available, then I'd have some
>> chance
>> of debugging it,
>>
>> cheers - Barry
>>
>>
>>
>> On Monday 29 August 2011 12:23, Tom Hoar wrote:
>> > I've implemented a multi-threaded Python wrapper that loads moses
>> > decoder and pipes strings through the moses binary. It's similar to Ivan
>> > Uemlianin's code from May 04, 2010 on this listserv, but achieves a
>> > throughput efficiency 398% CPU load on a quad-core host across multiple
>> > documents processed in a queue.
>> >
>> > Here's the rub. The decoder & the
>> > wrapper run great for about 2 hours. Then they halt with an unknown
>> > error. It's difficult to trace because it takes hours to reproduce. I
>> > can see that the Moses binary doesn't generate an error exit code.
>> > There's no error message about a "broken" pipe. When I restart the
>> > script on the file that was in-process at the time of hault, it runs
>> > just fine and continues processing. Since the error occurs consistently
>> > at the 2 hour mark, and it's not the file causing the halt, I suspect at
>> > a cache or buffer somewhere is overloaded. I've checked my python code,
>> > and don't believe there are any buffer overruns there.
>> >
>> > I'm hoping
>> > someone can review my comments and give me some pointers about Moses'
>> > caches and how to verify manage the caches. The Moses manual describes
>> > three cache:
>> >
>> >       * "-clean-lm-cache: clean language model caches after N
>> > translations (default N=1)" : If -clean-lm-cache defaults to cleaning
>> > the lm cache after each translation, I don't think this is a problem.
>> >
>> >       * "-persistent-cache-size: maximum size of cache for translation
>> > options (default 10,000 input phrases)" : Some of my files have my files
>> > have 2,500 or more pages with 20-25 sentence lines each. This could
>> > exceed the default 10,000 input phrase cache. Would it be better to bump
>> > up the -persistent-cache-size value, or manage the number of phrase I
>> > pass to the input?
>> >       * "-use-persistent-cache: cache translation
>> > options across sentences (default true)" : Regarding cashing across
>> > sentences (which presumably apples to -use-persistent-cache), the manual
>> > says, "you should also make sure that the cache is frequently cleared."
>> > How do I clear the cache? Does this require forcing moses itself to
>> > unload, and then reload it? Also, the -use-persistent-cache value
>> > defaults to "true". What is the effect of changing this to "false"? Does
>> > it effectively disable this cache and eliminate the requirement to clear
>> > it?
>> >
>> > Thanks,
>> > Tom
>>
>>  --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to