On Jan 13, 2012, at 5:40 PM, Matt Arsenault wrote:

> For a few percent (< about 2%) of results returned, the returned stderr_txt 
> is incorrectly empty (and it has been this way for about a year I think). 
> Instead, we just get blank output like this:

I believe I found the problem. After the process is complete, the redirected 
stderr file isn't necessarily written to disk resulting in truncated or missing 
output in some cases before the client attempts to read it. fflush is not 
sufficient for this. I haven't seen this happen from one of my systems since 
I've tried this.

Also I found that there's a window of opportunity where if the client quits/is 
killed/crashes a just completed task will be lost and start over.

In ACTIVE_TASK::handle_exited_app:

   // …. 
    if (!will_restart) {
        copy_output_files();
        int retval = read_stderr_file();
        if (retval) {
            msg_printf(result->project, MSG_INTERNAL_ERROR,
                "read_stderr_file(): %s", boincerror(retval)
            );
        }
        client_clean_out_dir(slot_dir, "handle_exited_app()");
        clear_schedule_backoffs(this);
            // clear scheduling backoffs of jobs waiting for GPU
    }

The slot directory is wiped out immediately after a task completes, but the 
state file is not written immediately. If you interrupt the client after this 
there's no record that it completed and the task starts fresh again.

Attachment: BoincFlush.diff
Description: Binary data

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to