Here's another example of a case where BOINC finds that it can't walk and chew
gum at the same time. The event of interest is
11/05/2015 18:35:34 | GPUGRID | Computation for task
e10s9_e7s6f4-GERARD_FXCXCL12_LIG_6282622-0-1-RND7898_0 finished
Following that, there's a 12-second interval where neither heartbeats nor GUI
RPC traffic was logged: during that time, the Task tab of the Manager was
unchanging, not showing the regular update of elapsed time for running tasks.
async_file_debug was active at the time, but found no events to log.
These particular GPUGrid tasks generate around 90 MB of upload files, but I
think they are generated directly in the project folder and don't need to be
copied anywhere.
Main log as attached file only.
I'll catch a CMS-dev log later this evening, but after that, I'll be away for a
few days and I'll have to leave the bug-chase until the weekend.
On Monday, 11 May 2015, 9:42, Jacob Klein <jacob_w_kl...@msn.com> wrote:
I have seen this problem before, where the UI becomes unresponsive. If I
recall, it happens when a T4T task is being set up (ie: after everything was
downloaded). For me, I don't recall the problem ever "screwing over other
tasks", though.
Try this to reproduce it: Attach to T4T, and get a task. It may take a while
to do that download, so you can "step away" for a bit. Then, once that task
is going, abort it. Downloading the 2nd task should be instantaneous
(nothing really to download), but instantiation of that 2nd task should
cause the UI to hang (showing the "Please wait" messagebox in the manager).
Does that help?
> Date: Sun, 10 May 2015 23:19:24 -0700
> From: da...@ssl.berkeley.edu <mailto:da...@ssl.berkeley.edu>
> To: r.haselgr...@btopenworld.com <mailto:r.haselgr...@btopenworld.com>;
onec...@hotmail.com <mailto:onec...@hotmail.com>
> CC: boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
> Subject: Re: [boinc_alpha] BOINC re-using slot directories without
ensuring they're empty
>
> I did some initial testing and couldn't repro this;
> the client remains responsive while copying a 5 GB file to a slot dir.
> Does anyone else see this behavior?
>
> While testing this, please set "async_file_debug" log flag.
> This says when asynchronous file operations start and end.
>
> -- David
>
> On 10-May-2015 12:31 PM, Richard Haselgrove wrote:
> > One thing that may need attention if very large files become the norm is
the
> > single-threaded nature of some parts of the core client. My 1-hour CMS
test has
> > just finished, and a new 24-hour test started.
> >
> >
> > I watched this happening, and part of the process is copying a 1.33 GB
initial
> > .vmi image file (downloaded previously by BOINC from CERN) from the
project
> > directory to the slot directory. This took about 90 seconds: during that
time, all
> > Manager updating stopped. I'm sure it's the copying process which
inhibited
> > updates: I was watching the slot directory, and the .vmi image file had
appeared,
> > but other essential startup files hadn't.
> >
> >
> > When BOINC regained its ability to communicate, three running tasks had
exited
> > with the dreaded (and false) 'you may need to reset the project' advice.
inline
> > log follows: because my last log got mangled by my ISP's new mail
interface, I'll
> > attach it as a text file as well.
> >
> >
> > 10/05/2015 20:12:56 | LHC@home <mailto:LHC@home> 1.0 | Computation for
task
> >
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1
> > finished
> > 10/05/2015 20:12:56 | CMS-dev | Starting task
CMS_31107_1427806626.783437_0
> > 10/05/2015 20:12:56 | CMS-dev | [cpu_sched] Starting task
> > CMS_31107_1427806626.783437_0 using CMS version 4615 (vbox64) in slot 7
> > 10/05/2015 20:14:25 | climateprediction.net | Task
> > hadam3p_anz_e3g7_2013_1_009760406_0 exited with zero status but no
'finished' file
> > 10/05/2015 20:14:25 | climateprediction.net | If this happens repeatedly
you may
> > need to reset the project.
> > 10/05/2015 20:14:25 | NumberFields@home <mailto:NumberFields@home> |
Task
> > wu_sf3_DS-10x271_Grp503196of682667_0 exited with zero status but no
'finished' file
> > 10/05/2015 20:14:25 | NumberFields@home <mailto:NumberFields@home> | If
this happens repeatedly you may need
> > to reset the project.
> > 10/05/2015 20:14:25 | SETI@home <mailto:SETI@home> | Task
05jl12ab.3911.10292.438086664199.12.207_1
> > exited with zero status but no 'finished' file
> > 10/05/2015 20:14:25 | SETI@home <mailto:SETI@home> | If this happens
repeatedly you may need to reset
> > the project.
> > 10/05/2015 20:14:25 | climateprediction.net | [cpu_sched] Restarting
task
> > hadam3p_anz_e3g7_2013_1_009760406_0 using hadam3p_anz version 610 in
slot 5
> > 10/05/2015 20:14:25 | NumberFields@home <mailto:NumberFields@home> |
[cpu_sched] Restarting task
> > wu_sf3_DS-10x271_Grp503196of682667_0 using GetDecics version 200 in
slot 0
> > 10/05/2015 20:14:25 | SETI@home <mailto:SETI@home> | [cpu_sched]
Restarting task
> > 05jl12ab.3911.10292.438086664199.12.207_1 using setiathome_v7 version
700 (cuda42)
> > in slot 2
> > 10/05/2015 20:14:27 | LHC@home <mailto:LHC@home> 1.0 | Started upload of
> >
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1_0
> > 10/05/2015 20:14:30 | LHC@home <mailto:LHC@home> 1.0 | Finished upload
of
> >
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1_0
> >
> >
> >
> >
> >
> > On Sunday, 10 May 2015, 19:59, Seke Rob <onec...@hotmail.com
<mailto:onec...@hotmail.com>> wrote:
> >
> >
> >
> > Excellent this is all fixed and tested. Interest is/was that WCG's
Clean
> > Energy at some point in time was to run very large models, talk of
4-8GB IIRC.
> >
> > --SekeRob
> >
> > On May 10, 2015 20:27, Richard Haselgrove
<r.haselgr...@btopenworld.com <mailto:r.haselgr...@btopenworld.com>
> > <mailto:r.haselgr...@btopenworld.com
<mailto:r.haselgr...@btopenworld.com>>> wrote:
> > CMS only has stock applications configured for delivery to 64-bit
platforms.
> > I've made an anonymous platform configuration using the 32-bit VBox
Windows
> > wrapper: it has downloaded and is running its first 1-hour task. If
that
> > completes successfully (it seems to have reached the
fully-operational stage),
> > I'll try a full 24-hour task, which under current operational
circumstances
> > should generate a >4 GB file locally.
> >
> >
> > On Sunday, 10 May 2015, 18:28, David Anderson
<da...@ssl.berkeley.edu <mailto:da...@ssl.berkeley.edu>
> > <mailto:da...@ssl.berkeley.edu <mailto:da...@ssl.berkeley.edu>>>
wrote:
> >
> >
> >
> > NTFS handles > 4GB files, even if the hardware and/or OS is only
32-bit.
> > 32-bit versions of Windows have APIs (like _stat64()) for handling >
4GB files.
> > BOINC needs to use these; we fixed one place where it wasn't.
> >
> > On Unix (Linux and Mac), BOINC uses the regular APIs (like lseek())
but is
> > built with a
> > -D_FILE_OFFSET_BITS=64 flag that causes these functions to 64-bit
size.
> > However, it's possible that BOINC has bugs involving > 4GB files on
Unix too.
> > If anyone has a 32-bit Linux system, please test with the CMS
project.
> >
> > -- David
> >
> > On 10-May-2015 3:58 AM, --SekeRob wrote:
> > >
> > > Just wondering, with files over 4GB and a 64 bit lib introduced, is
it not a CMS
> > > project requirement to run on a 64 bit OS?
> > >
> > >
> >
> > _______________________________________________
> > boinc_alpha mailing list
> > boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
<mailto:boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>>
> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
> > To unsubscribe, visit the above URL and
> > (near bottom of page) enter your email address.
> >
> >
> >
> >
> > _______________________________________________
> > boinc_alpha mailing list
> > boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
<mailto:boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>>
> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
> > To unsubscribe, visit the above URL and
> > (near bottom of page) enter your email address.
> >
> >
>
> _______________________________________________
> boinc_alpha mailing list
> boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
_______________________________________________
boinc_alpha mailing list
boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.