BTW: the client isn't completely single-threaded;
it uses a separate thread to do CPU throttling.
It would be feasible to also use separate threads
for serving GUI RPC connections,
which would allow client to remain responsive even while
e.g. copying thousands of files to a slot dir.
-- David

On 12-May-2015 2:40 AM, Seke Rob wrote:
Reminds me of the Clean Energy Project, Phase 2 and why we have app_config and <max_concurrent> and a default control of allowing 1 'In Progress' on a host. This project sets up in slot copying near 6700 files [symlinking proposed long ago as is done on several other WCG projects for the static files]. If more than one CEP2 is started the machine feels at times like a snail, responsiveness of the BOINC manager is poor, many a time the less powerful systems incurring error zero status exits or total fail. On an 8 core observed it could take over an hour before actual computing commenced [CPU time logged]. Boot cycle requires manually starting of tasks one by one. Kevin Reed few years ago raised a ticket for staggered starting, where the models can reach several GB and bigger in the coming. At any rate, as much as these 6700 files are copied, they also then are needing of deletion at completion [physical or symlink references]. The effect of starting 1 CEP2 and finishing / packaging / zipping and transmitting can easily lead to several minutes of there not being any computing, just whirring, for minutes, just elapsed being logged. The more run the more the issue compounds, with the effect of what many incur, the exit zero status series, resetting to start or last checkpoint with often hours of computing time lost.

Maybe you'd like to get in touch with your confederates at WCG [Keith Uplinger], to discuss the issue further as this is now nearing a 5 year continues frustration [June 2010 launch, and a huge limitation on the speed of progress on this project].

--SekeRob.

On 12-5-2015 1:55, David Anderson wrote:
That delay looks like it's caused by deleting files or by process cleanup.
Does GPUGrid make lots of (non-output) files in the slot dir?

Please try to repro it with slot_debug, task_debug, and heartbeat_debug set
(gui_rpc_debug not needed).

-- David

On 11-May-2015 10:54 AM, Richard Haselgrove wrote:
Here's another example of a case where BOINC finds that it can't walk and chew gum at the same time. The event of interest is

11/05/2015 18:35:34 | GPUGRID | Computation for task e10s9_e7s6f4-GERARD_FXCXCL12_LIG_6282622-0-1-RND7898_0 finished

Following that, there's a 12-second interval where neither heartbeats nor GUI RPC traffic was logged: during that time, the Task tab of the Manager was unchanging, not showing the regular update of elapsed time for running tasks.

async_file_debug was active at the time, but found no events to log.

These particular GPUGrid tasks generate around 90 MB of upload files, but I think they are generated directly in the project folder and don't need to be copied anywhere.

Main log as attached file only.

I'll catch a CMS-dev log later this evening, but after that, I'll be away for a few days and I'll have to leave the bug-chase until the weekend.




On Monday, 11 May 2015, 9:42, Jacob Klein <jacob_w_kl...@msn.com> wrote:



    I have seen this problem before, where the UI becomes unresponsive. If I
    recall, it happens when a T4T task is being set up (ie: after everything was
    downloaded). For me, I don't recall the problem ever "screwing over other
    tasks", though.

    Try this to reproduce it: Attach to T4T, and get a task. It may take a while
    to do that download, so you can "step away" for a bit. Then, once that task
    is going, abort it. Downloading the 2nd task should be instantaneous
    (nothing really to download), but instantiation of that 2nd task should
    cause the UI to hang (showing the "Please wait" messagebox in the manager).

    Does that help?
    > Date: Sun, 10 May 2015 23:19:24 -0700
    > From: da...@ssl.berkeley.edu <mailto:da...@ssl.berkeley.edu>
    > To: r.haselgr...@btopenworld.com <mailto:r.haselgr...@btopenworld.com>;
    onec...@hotmail.com <mailto:onec...@hotmail.com>
    > CC: boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
    > Subject: Re: [boinc_alpha] BOINC re-using slot directories without
    ensuring they're empty
    >
    > I did some initial testing and couldn't repro this;
    > the client remains responsive while copying a 5 GB file to a slot dir.
    > Does anyone else see this behavior?
    >
    > While testing this, please set "async_file_debug" log flag.
    > This says when asynchronous file operations start and end.
    >
    > -- David
    >
    > On 10-May-2015 12:31 PM, Richard Haselgrove wrote:
    > > One thing that may need attention if very large files become the norm is
    the
    > > single-threaded nature of some parts of the core client. My 1-hour CMS
    test has
    > > just finished, and a new 24-hour test started.
    > >
    > >
    > > I watched this happening, and part of the process is copying a 1.33 GB
    initial
    > > .vmi image file (downloaded previously by BOINC from CERN) from the 
project
    > > directory to the slot directory. This took about 90 seconds: during that
    time, all
    > > Manager updating stopped. I'm sure it's the copying process which 
inhibited
    > > updates: I was watching the slot directory, and the .vmi image file had
    appeared,
    > > but other essential startup files hadn't.
    > >
    > >
    > > When BOINC regained its ability to communicate, three running tasks had
    exited
    > > with the dreaded (and false) 'you may need to reset the project' advice.
    inline
    > > log follows: because my last log got mangled by my ISP's new mail
    interface, I'll
    > > attach it as a text file as well.
    > >
    > >
    > > 10/05/2015 20:12:56 | LHC@home <mailto:LHC@home> 1.0 | Computation for 
task
    > >
    
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1

    > > finished
    > > 10/05/2015 20:12:56 | CMS-dev | Starting task 
CMS_31107_1427806626.783437_0
    > > 10/05/2015 20:12:56 | CMS-dev | [cpu_sched] Starting task
    > > CMS_31107_1427806626.783437_0 using CMS version 4615 (vbox64) in slot 7
    > > 10/05/2015 20:14:25 | climateprediction.net | Task
    > > hadam3p_anz_e3g7_2013_1_009760406_0 exited with zero status but no
    'finished' file
    > > 10/05/2015 20:14:25 | climateprediction.net | If this happens repeatedly
    you may
    > > need to reset the project.
    > > 10/05/2015 20:14:25 | NumberFields@home <mailto:NumberFields@home> | 
Task
    > > wu_sf3_DS-10x271_Grp503196of682667_0 exited with zero status but no
    'finished' file
    > > 10/05/2015 20:14:25 | NumberFields@home <mailto:NumberFields@home> | If
    this happens repeatedly you may need
    > > to reset the project.
    > > 10/05/2015 20:14:25 | SETI@home <mailto:SETI@home> | Task
    05jl12ab.3911.10292.438086664199.12.207_1
    > > exited with zero status but no 'finished' file
    > > 10/05/2015 20:14:25 | SETI@home <mailto:SETI@home> | If this happens
    repeatedly you may need to reset
    > > the project.
    > > 10/05/2015 20:14:25 | climateprediction.net | [cpu_sched] Restarting 
task
    > > hadam3p_anz_e3g7_2013_1_009760406_0 using hadam3p_anz version 610 in 
slot 5
    > > 10/05/2015 20:14:25 | NumberFields@home <mailto:NumberFields@home> |
    [cpu_sched] Restarting task
    > > wu_sf3_DS-10x271_Grp503196of682667_0 using GetDecics version 200 in 
slot 0
    > > 10/05/2015 20:14:25 | SETI@home <mailto:SETI@home> | [cpu_sched]
    Restarting task
    > > 05jl12ab.3911.10292.438086664199.12.207_1 using setiathome_v7 version
    700 (cuda42)
    > > in slot 2
    > > 10/05/2015 20:14:27 | LHC@home <mailto:LHC@home> 1.0 | Started upload of
    > >
    
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1_0
    > > 10/05/2015 20:14:30 | LHC@home <mailto:LHC@home> 1.0 | Finished upload 
of
    > >
    
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1_0
    > >
    > >
    > >
    > >
    > >
    > > On Sunday, 10 May 2015, 19:59, Seke Rob <onec...@hotmail.com
    <mailto:onec...@hotmail.com>> wrote:
    > >
    > >
    > >
    > >    Excellent this is all fixed and tested. Interest is/was that WCG's 
Clean
    > >    Energy at some point in time was to run very large models, talk of
    4-8GB IIRC.
    > >
    > >    --SekeRob
    > >
    > >    On May 10, 2015 20:27, Richard Haselgrove
    <r.haselgr...@btopenworld.com <mailto:r.haselgr...@btopenworld.com>
    > >    <mailto:r.haselgr...@btopenworld.com
    <mailto:r.haselgr...@btopenworld.com>>> wrote:
    > >    CMS only has stock applications configured for delivery to 64-bit
    platforms.
    > >    I've made an anonymous platform configuration using the 32-bit VBox
    Windows
    > >    wrapper: it has downloaded and is running its first 1-hour task. If 
that
    > >    completes successfully (it seems to have reached the
    fully-operational stage),
    > >    I'll try a full 24-hour task, which under current operational
    circumstances
    > >    should generate a >4 GB file locally.
    > >
    > >
    > >        On Sunday, 10 May 2015, 18:28, David Anderson
    <da...@ssl.berkeley.edu <mailto:da...@ssl.berkeley.edu>
    > >    <mailto:da...@ssl.berkeley.edu <mailto:da...@ssl.berkeley.edu>>> 
wrote:
    > >
    > >
    > >
    > >    NTFS handles > 4GB files, even if the hardware and/or OS is only 
32-bit.
    > >    32-bit versions of Windows have APIs (like _stat64()) for handling >
    4GB files.
    > >    BOINC needs to use these; we fixed one place where it wasn't.
    > >
    > >    On Unix (Linux and Mac), BOINC uses the regular APIs (like lseek())
    but is
    > >    built with a
    > >    -D_FILE_OFFSET_BITS=64 flag that causes these functions to 64-bit 
size.
    > >    However, it's possible that BOINC has bugs involving > 4GB files on
    Unix too.
    > >    If anyone has a 32-bit Linux system, please test with the CMS 
project.
    > >
    > >    -- David
    > >
    > >    On 10-May-2015 3:58 AM, --SekeRob wrote:
    > >    >
    > >    > Just wondering, with files over 4GB and a 64 bit lib introduced, is
    it not a CMS
    > >    > project requirement to run on a 64 bit OS?
    > >    >
    > >    >
    > >
    > > _______________________________________________
    > >    boinc_alpha mailing list
    > > boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
    <mailto:boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>>
    > > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
    > >    To unsubscribe, visit the above URL and
    > >    (near bottom of page) enter your email address.

    > >
    > >
    > >
    > >
    > > _______________________________________________
    > >    boinc_alpha mailing list
    > > boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
    <mailto:boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>>
    > > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
    > >    To unsubscribe, visit the above URL and
    > >    (near bottom of page) enter your email address.
    > >
    > >
    >
    > _______________________________________________
    > boinc_alpha mailing list
    > boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
    > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
    > To unsubscribe, visit the above URL and
    > (near bottom of page) enter your email address.

    _______________________________________________
    boinc_alpha mailing list
    boinc_al...@ssl.berkeley.edu <mailto:boinc_al...@ssl.berkeley.edu>
    http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
    To unsubscribe, visit the above URL and
    (near bottom of page) enter your email address.






------------------------------------------------------------------------------------
Avast logo <http://www.avast.com/>        

This email has been checked for viruses by Avast antivirus software.
www.avast.com <http://www.avast.com/>



_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to