A couple weeks ago, I started a thread "Moses cashing questions" 
 (below). After some debugging, I discovered the problem was related to 
 this thread, and not related to the -persistent-cache-size option 
 (default = 10,000 phrases).

 My Python code (subprocess.Popen) trapped stdout and stderr but never 
 serviced the stderr buffer. The buffer filled after multi-thousands of 
 translations and everything halted. In my first attempt to fix it, I 
 created a new thread to service stdout and used stderr output to control 
 program. This was a mistake because, as this thread notes, stderr raced 
 ahead of stdout. The program flow erroneously continued before 
 translation completed. I lost translations.

 The successful solution spins off a thread to service stderr and the 
 program flow pauses until the stdout phrase count matches the stdin 
 phrase count. I could easily save the stderr buffer to a file. However, 
 if moses crashed, possibly because of a missed | character, it's almost 
 impossible to trace/align the crash to the stderr output, as addressed 
 in this thread. Therefore, I simply dump the stderr buffer. I processed 
 over 5.5 million phrase translations with this solution without any 
 buffer errors or "messed up stderr output".

 I vote for no change to the threading queue on this issue. Prepending a 
 thread number to stderr output won't help because the stdout data isn't 
 tagged with a thread number. Even if they were, there's no guarantee the 
 thread-tagged stderr outputs would be in the same order as the stdout 
 that caused the crash.

 Once you understand how the Moses binary stdin/stdout/stderr work, it's 
 easy to manage. Don't know if moses_chart is the same. I agree with 
 Barry. If you really need to do serious debugging, drop to 
 single-threaded mode and you get aligned output between stderr/stdout.

 Tom


 -------- Original Message --------
 Subject: Re: [Moses-support] Moses caching questions
 Date: Mon, 29 Aug 2011 19:46:43 +0700
 From: Tom Hoar <[email protected]>
 To: <[email protected]>

 Thanks Barry.

 The quote comes from http://www.statmt.org/moses/manual/manual.pdf, 
 page 185, section 5.4.4, "Caching across Sentences" paragraph.

 I understood Ivan's problem. My script is only remotely similar. I also 
 frequently use the multi-threading/multiprocessor configuration. 
 However, this new multi-threading module provides non-blocking service 
 of stderr and stdout output on separate threads. It's possible for my 
 queue to piggyback input from multiple text file sources into the 
 stream. So, moses never experiences a break with an EOF. This could 
 conceivably be seen by moses as one huge file with multi-ten's of 
 thousands of lines.

 My machine is a 4 GB Core2Quad. The "top" utility reports my memory 
 stabilizes to about 100KB of free RAM during runtime. Swap file usage 
 remains stable at only 68KB used out of 6 GB. I'm using a 
 binarized/memmapped KenLM, trained with SRILM, configured with code 9 
 and binarized/memmapped tables.

 I like your idea of testing the moses core without the Python wrapper. 
 First, I have a work-around and must finish processing this batch. Then, 
 I'll save about 200 tokenized files. If I processing with the same moses 
 configuration using a Bash script that redirects the input as discrete 
 files moses will see an EOF and reload between files. This is different 
 than my configuration. So, I'll concatenate the files to one huge file 
 and then process over 10,000 lines in one pass.

 It'll take me about a week to get to this. I'll let you know. By the 
 way, I use the Dash redirection for evaluation. During Evaluation and 
 mert the "top" utility show peak CPU efficiency at "only" 385% on a 
 quad-core. Is this also your experience with multi-threading on 
 multiprocessors?

 Tom



 On Wed, 14 Sep 2011 15:08:36 +0200, Christian Hardmeier <[email protected]> 
 wrote:
> What would be really nice is having the log lines prefixed with the
> thread number and/or sentence number plus some kind of locking to
> ensure that the lines don't mix. This would give you real-time
> logging, and if you want the logs separated, you could easily use 
> sort
> or grep to do that.
> But I don't know if it's worth the effort implementing that.
>
> /Christian
>
> On Sep 14, 2011, at 2:59 PM, Barry Haddow wrote:
>
>> Kenneth's right. The logging output was intentionally left untouched 
>> when the
>> multi-threading was added, and I can't see any reason to change 
>> this. If
>> you're debugging using verbose output, then you nearly always want 
>> to run
>> single-threaded.
>>
>> cheers - Barry
>>
>> On Wednesday 14 September 2011 13:39:12 Kenneth Heafield wrote:
>>> So what exactly is the issue?  Progress can be monitored with 
>>> stdout.
>>> If stderr is queued, then you won't get sub-sentential progress 
>>> anyway.
>>>
>>> I'd rather stderr tell me what it's doing so if/when there's a 
>>> segfault,
>>> I have a place to start.
>>>
>>> Kenneth
>>>
>>> On 09/14/11 13:32, Phil Williams wrote:
>>>> Yes, that would work, it just needs someone to spend the time 
>>>> going
>>>> through the source and fixing the logging code.  It's a bigger -- 
>>>> and
>>>> less critical -- job than for the rest of the output, so hasn't 
>>>> been
>>>> done yet.
>>>>
>>>> Phil
>>>>
>>>> On 14 Sep, 2011,at 01:12 PM, Taylor Rose
>>>>
>>>> <[email protected]> wrote:
>>>>> Couldn't you queue the stderr logging to solve this issue?
>>>>>
>>>>> On Wed, 2011-09-14 at 08:55 +0000, Phil Williams wrote:
>>>>>> Hi Tom,
>>>>>>
>>>>>>
>>>>>> yes, that's right. In multithreaded moses / moses_chart, the
>>>>>> translations, n-best, and trace output for sentence n are all 
>>>>>> queued
>>>>>> until the output from sentence n-1 has been written. The 
>>>>>> queueing
>>>>>> doesn't happen for the logging output that goes to stderr -- 
>>>>>> it's
>>>>>> written immediately -- so it will appear out of order and out of 
>>>>>> sync
>>>>>> with the rest of the output.
>>>>>>
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> On 14 Sep, 2011,at 01:33 AM, Tom Hoar
>>>>>> <[email protected]
>>>>>
>>>>> <mailto:[email protected]>> wrote:
>>>>>>> Phil,
>>>>>>>
>>>>>>> Re "output to stderr will be messed up"... do you mean that the
>>>>>>> order and timing of stderr output will be out of synch with the
>>>>>>> output to stdout? I found this to be the case with 
>>>>>>> multi-threaded
>>>>>>> moses and therefore stderr output can not be used to monitor
>>>>>>> progress and/or control workflows.
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 13 Sep 2011 19:27:47 +0000 (GMT), Phil Williams
>>>>>>>
>>>>>>> <[email protected] <mailto:philip.williams@maccom>> 
>>>>>>> wrote:
>>>>>>>> I think GENERAL:cores sets the maximum number of active EMS 
>>>>>>>> steps,
>>>>>>>> it doesn't change the number of threads used for decoding. You
>>>>>>>> need to set the decoder's -threads N option in
>>>>>>>> TUNING:decoder-settings and/or EVALUATION:decoder-settings.
>>>>>>>>
>>>>>>>>
>>>>>>>> A caveat is that the output to stderr will be messed up, 
>>>>>>>> though
>>>>>>>> that's true for multi-threaded moses as well.
>>>>>>>>
>>>>>>>>
>>>>>>>> Phil
>>>>>>>>
>>>>>>>> On 13 Sep, 2011,at 08:11 PM, Hieu Hoang wrote:
>>>>>>>>> Is it as simple as setting the
>>>>>>>>> [GENERAL]
>>>>>>>>> cores = 8
>>>>>>>>> in the config file, and making sure the decoding was compiled
>>>>>>>>> with threads?
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Moses-support mailing list
>>>>>>>>> [email protected] <mailto:[email protected]>
>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected] <mailto:Moses-support@mitedu>
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected] <mailto:[email protected]>
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> <http://mailmanmit.edu/mailman/listinfo/moses-support>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to