On Jan 13, 2012, at 5:40 PM, Matt Arsenault wrote: > For a few percent (< about 2%) of results returned, the returned stderr_txt > is incorrectly empty (and it has been this way for about a year I think). > Instead, we just get blank output like this:
I believe I found the problem. After the process is complete, the redirected
stderr file isn't necessarily written to disk resulting in truncated or missing
output in some cases before the client attempts to read it. fflush is not
sufficient for this. I haven't seen this happen from one of my systems since
I've tried this.
Also I found that there's a window of opportunity where if the client quits/is
killed/crashes a just completed task will be lost and start over.
In ACTIVE_TASK::handle_exited_app:
// ….
if (!will_restart) {
copy_output_files();
int retval = read_stderr_file();
if (retval) {
msg_printf(result->project, MSG_INTERNAL_ERROR,
"read_stderr_file(): %s", boincerror(retval)
);
}
client_clean_out_dir(slot_dir, "handle_exited_app()");
clear_schedule_backoffs(this);
// clear scheduling backoffs of jobs waiting for GPU
}
The slot directory is wiped out immediately after a task completes, but the
state file is not written immediately. If you interrupt the client after this
there's no record that it completed and the task starts fresh again.
BoincFlush.diff
Description: Binary data
_______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
