Process Monitor can be used to "watch the things a process does" (you have to 
set up correct filters, etc.)... but I'm not sure if that includes sleeps. But 
if the process is waiting on a file or something, though, it should be able to 
tell you. Worth looking into.

https://technet.microsoft.com/en-us/library/bb896645.aspx

Regards,
Jacob
Date: Mon, 18 May 2015 10:41:16 -0700
From: da...@ssl.berkeley.edu
To: r.haselgr...@btopenworld.com; onec...@hotmail.com; jacob_w_kl...@msn.com
CC: boinc_dev@ssl.berkeley.edu
Subject: Re: [boinc_dev] [boinc_alpha] BOINC re-using slot directories without 
ensuring they're empty


  
    
  
  
    I looked at this and couldn't figure out the source of the 12-sec
      delay.

      In general, delays could happen because

      1) the client does something that takes a long time (like copying
      a 5 GB file)

      2) the client sleeps (i.e. calls boinc_sleep()).

         It does this in a few situations,

         like backing off and retrying a file system operation.

      But there's no indication that either of these is happening here.

      

      Does Windows have a way of logging the system calls that a process
      makes

      (like strace on Unix)?

      If so that might reveal what the client is doing during those 12
      seconds.

      

      -- David

    

    On 16-May-2015 8:01 AM, Richard
      Haselgrove wrote:

    
    
      
        Here is the message
            log file for a GPUGrid task finish. The 12-second delay
            appears again between 14:26:35 and 14:26:47 - that's after
            the slot directory has been cleared, and the exiting task
            has changed state from 'running' to 'uploading'. Two new
            tasks have been assigned to the GPU, but their (small)
            startup files have not yet been linked to their respective
            slot directories.
        

          
        I also attach
            directory listings for the slot and GPUGrid project folders
            at various stages of the cleanup: the slot held 34 files
            totalling 44,186,727 bytes, which doesn't sound excessive:
            the largest file deletion (94,783,960 bytes) occurred
            several minutes later, when that file finished uploading.
        

          
        I'll enable similar
            logging and watch what happens when the next GPUGrid task
            starts up, but from memory, the disruption to BOINC is less
            severe at startup. 
        

        

          

        
        
          
            
                On Tuesday,
                  12 May 2015, 23:29, David Anderson
                  <da...@ssl.berkeley.edu> wrote:

                 
               

                

                BTW: the client isn't
                  completely single-threaded;
                  it uses a separate thread to do CPU throttling.
                  It would be feasible to also use separate threads
                  for serving GUI RPC connections,
                  which would allow client to remain responsive even
                  while
                  e.g. copying thousands of files to a slot dir.
                  -- David
                  
                  On 12-May-2015 2:40 AM, Seke Rob wrote:
                  > Reminds me of the Clean Energy Project, Phase 2
                  and why we have app_config and 
                  > <max_concurrent> and a default control of
                  allowing 1 'In Progress' on a host. This 
                  > project sets up in slot copying near 6700 files
                  [symlinking proposed long ago as 
                  > is done on several other WCG projects for the
                  static files]. If more than one CEP2 
                  > is started the machine feels at times like a
                  snail, responsiveness of the BOINC 
                  > manager is poor, many a time the less powerful
                  systems incurring error zero status 
                  > exits or total fail. On an 8 core observed it
                  could take over an hour before 
                  > actual computing commenced [CPU time logged].
                  Boot cycle requires manually 
                  > starting of tasks one by one. Kevin Reed few
                  years ago raised a ticket for 
                  > staggered starting, where the models can reach
                  several GB and bigger in the 
                  > coming. At any rate, as much as these 6700 files
                  are copied, they also then are 
                  > needing of deletion at completion [physical or
                  symlink references]. The effect of 
                  > starting 1 CEP2 and finishing / packaging /
                  zipping and transmitting can easily 
                  > lead to several minutes of there not being any
                  computing, just whirring, for 
                  > minutes, just elapsed being logged. The more run
                  the more the issue compounds, 
                  > with the effect of what many incur, the exit zero
                  status series, resetting to 
                  > start or last checkpoint with often hours of
                  computing time lost.
                  >
                  > Maybe you'd like to get in touch with your
                  confederates at WCG [Keith Uplinger], 
                  > to discuss the issue further as this is now
                  nearing a 5 year continues frustration 
                  > [June 2010 launch, and a huge limitation on the
                  speed of progress on this project].
                  >
                  > --SekeRob.
                  >
                  > On 12-5-2015 1:55, David Anderson wrote:
                  >> That delay looks like it's caused by deleting
                  files or by process cleanup.
                  >> Does GPUGrid make lots of (non-output) files
                  in the slot dir?
                  >>
                  >> Please try to repro it with slot_debug,
                  task_debug, and heartbeat_debug set
                  >> (gui_rpc_debug not needed).
                  >>
                  >> -- David
                  >>
                  >> On 11-May-2015 10:54 AM, Richard Haselgrove
                  wrote:
                  >>> Here's another example of a case where
                  BOINC finds that it can't walk and chew 
                  >>> gum at the same time. The event of
                  interest is
                  >>>
                  >>> 11/05/2015 18:35:34 | GPUGRID |
                  Computation for task 
                  >>>
                  e10s9_e7s6f4-GERARD_FXCXCL12_LIG_6282622-0-1-RND7898_0
                  finished
                  >>>
                  >>> Following that, there's a 12-second
                  interval where neither heartbeats nor GUI 
                  >>> RPC traffic was logged: during that time,
                  the Task tab of the Manager was 
                  >>> unchanging, not showing the regular
                  update of elapsed time for running tasks.
                  >>>
                  >>> async_file_debug was active at the time,
                  but found no events to log.
                  >>>
                  >>> These particular GPUGrid tasks generate
                  around 90 MB of upload files, but I 
                  >>> think they are generated directly in the
                  project folder and don't need to be 
                  >>> copied anywhere.
                  >>>
                  >>> Main log as attached file only.
                  >>>
                  >>> I'll catch a CMS-dev log later this
                  evening, but after that, I'll be away for a 
                  >>> few days and I'll have to leave the
                  bug-chase until the weekend.
                  >>>
                  >>>
                  >>>
                  >>>
                  >>> On Monday, 11 May 2015, 9:42, Jacob Klein
                  <jacob_w_kl...@msn.com>
                  wrote:
                  >>>
                  >>>
                  >>>
                  >>>    I have seen this problem before, where
                  the UI becomes unresponsive. If I
                  >>>    recall, it happens when a T4T task is
                  being set up (ie: after everything was
                  >>>    downloaded). For me, I don't recall
                  the problem ever "screwing over other
                  >>>    tasks", though.
                  >>>
                  >>>    Try this to reproduce it: Attach to
                  T4T, and get a task. It may take a while
                  >>>    to do that download, so you can "step
                  away" for a bit. Then, once that task
                  >>>    is going, abort it. Downloading the
                  2nd task should be instantaneous
                  >>>    (nothing really to download), but
                  instantiation of that 2nd task should
                  >>>    cause the UI to hang (showing the
                  "Please wait" messagebox in the manager).
                  >>>
                  >>>    Does that help?
                  >>>    > Date: Sun, 10 May 2015 23:19:24
                  -0700
                  >>>    > From: da...@ssl.berkeley.edu
                  <mailto:da...@ssl.berkeley.edu>
                  >>>    > To: r.haselgr...@btopenworld.com
                  <mailto:r.haselgr...@btopenworld.com>;
                  >>>    onec...@hotmail.com
                  <mailto:onec...@hotmail.com>
                  >>>    > CC: boinc_al...@ssl.berkeley.edu
                  <mailto:boinc_al...@ssl.berkeley.edu>
                  >>>    > Subject: Re: [boinc_alpha] BOINC
                  re-using slot directories without
                  >>>    ensuring they're empty
                  >>>    >
                  >>>    > I did some initial testing and
                  couldn't repro this;
                  >>>    > the client remains responsive
                  while copying a 5 GB file to a slot dir.
                  >>>    > Does anyone else see this
                  behavior?
                  >>>    >
                  >>>    > While testing this, please set
                  "async_file_debug" log flag.
                  >>>    > This says when asynchronous file
                  operations start and end.
                  >>>    >
                  >>>    > -- David
                  >>>    >
                  >>>    > On 10-May-2015 12:31 PM, Richard
                  Haselgrove wrote:
                  >>>    > > One thing that may need
                  attention if very large files become the norm is
                  >>>    the
                  >>>    > > single-threaded nature of
                  some parts of the core client. My 1-hour CMS
                  >>>    test has
                  >>>    > > just finished, and a new
                  24-hour test started.
                  >>>    > >
                  >>>    > >
                  >>>    > > I watched this happening,
                  and part of the process is copying a 1.33 GB
                  >>>    initial
                  >>>    > > .vmi image file (downloaded
                  previously by BOINC from CERN) from the project
                  >>>    > > directory to the slot
                  directory. This took about 90 seconds: during that
                  >>>    time, all
                  >>>    > > Manager updating stopped.
                  I'm sure it's the copying process which inhibited
                  >>>    > > updates: I was watching the
                  slot directory, and the .vmi image file had
                  >>>    appeared,
                  >>>    > > but other essential startup
                  files hadn't.
                  >>>    > >
                  >>>    > >
                  >>>    > > When BOINC regained its
                  ability to communicate, three running tasks had
                  >>>    exited
                  >>>    > > with the dreaded (and false)
                  'you may need to reset the project' advice.
                  >>>    inline
                  >>>    > > log follows: because my last
                  log got mangled by my ISP's new mail
                  >>>    interface, I'll
                  >>>    > > attach it as a text file as
                  well.
                  >>>    > >
                  >>>    > >
                  >>>    > > 10/05/2015 20:12:56 | LHC@home
                  <mailto:LHC@home>
                  1.0 | Computation for task
                  >>>    > >
                  >>>   
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1
                  >>>
                  >>>    > > finished
                  >>>    > > 10/05/2015 20:12:56 |
                  CMS-dev | Starting task CMS_31107_1427806626.783437_0
                  >>>    > > 10/05/2015 20:12:56 |
                  CMS-dev | [cpu_sched] Starting task
                  >>>    > >
                  CMS_31107_1427806626.783437_0 using CMS version 4615
                  (vbox64) in slot 7
                  >>>    > > 10/05/2015 20:14:25 |
                  climateprediction.net | Task
                  >>>    > >
                  hadam3p_anz_e3g7_2013_1_009760406_0 exited with zero
                  status but no
                  >>>    'finished' file
                  >>>    > > 10/05/2015 20:14:25 |
                  climateprediction.net | If this happens repeatedly
                  >>>    you may
                  >>>    > > need to reset the project.
                  >>>    > > 10/05/2015 20:14:25 | NumberFields@home
                  <mailto:NumberFields@home>
                  | Task
                  >>>    > >
                  wu_sf3_DS-10x271_Grp503196of682667_0 exited with zero
                  status but no
                  >>>    'finished' file
                  >>>    > > 10/05/2015 20:14:25 | NumberFields@home
                  <mailto:NumberFields@home>
                  | If
                  >>>    this happens repeatedly you may need
                  >>>    > > to reset the project.
                  >>>    > > 10/05/2015 20:14:25 | SETI@home
                  <mailto:SETI@home>
                  | Task
                  >>>   
                  05jl12ab.3911.10292.438086664199.12.207_1
                  >>>    > > exited with zero status but
                  no 'finished' file
                  >>>    > > 10/05/2015 20:14:25 | SETI@home
                  <mailto:SETI@home>
                  | If this happens
                  >>>    repeatedly you may need to reset
                  >>>    > > the project.
                  >>>    > > 10/05/2015 20:14:25 |
                  climateprediction.net | [cpu_sched] Restarting task
                  >>>    > >
                  hadam3p_anz_e3g7_2013_1_009760406_0 using hadam3p_anz
                  version 610 in slot 5
                  >>>    > > 10/05/2015 20:14:25 | NumberFields@home
                  <mailto:NumberFields@home>
                  |
                  >>>    [cpu_sched] Restarting task
                  >>>    > >
                  wu_sf3_DS-10x271_Grp503196of682667_0 using GetDecics
                  version 200 in slot 0
                  >>>    > > 10/05/2015 20:14:25 | SETI@home
                  <mailto:SETI@home>
                  | [cpu_sched]
                  >>>    Restarting task
                  >>>    > >
                  05jl12ab.3911.10292.438086664199.12.207_1 using
                  setiathome_v7 version
                  >>>    700 (cuda42)
                  >>>    > > in slot 2
                  >>>    > > 10/05/2015 20:14:27 | LHC@home
                  <mailto:LHC@home>
                  1.0 | Started upload of
                  >>>    > >
                  >>>   
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1_0
                  >>>    > > 10/05/2015 20:14:30 | LHC@home
                  <mailto:LHC@home>
                  1.0 | Finished upload of
                  >>>    > >
                  >>>   
sd_FCChh_bs25_beta30_xing120_int1.0_emit2.0_tunex117.216_tuney118.226_6D_V4__1__s__118.31_117.32__4.1_4.2__6__20_1_sixvf_boinc701_1_0
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  >>>    > > On Sunday, 10 May 2015,
                  19:59, Seke Rob <onec...@hotmail.com
                  >>>    <mailto:onec...@hotmail.com>>
                  wrote:
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  >>>    > >    Excellent this is all
                  fixed and tested. Interest is/was that WCG's Clean
                  >>>    > >    Energy at some point in
                  time was to run very large models, talk of
                  >>>    4-8GB IIRC.
                  >>>    > >
                  >>>    > >    --SekeRob
                  >>>    > >
                  >>>    > >    On May 10, 2015 20:27,
                  Richard Haselgrove
                  >>>    <r.haselgr...@btopenworld.com
                  <mailto:r.haselgr...@btopenworld.com>
                  >>>    > >    <mailto:r.haselgr...@btopenworld.com
                  >>>    <mailto:r.haselgr...@btopenworld.com>>>
                  wrote:
                  >>>    > >    CMS only has stock
                  applications configured for delivery to 64-bit
                  >>>    platforms.
                  >>>    > >    I've made an anonymous
                  platform configuration using the 32-bit VBox
                  >>>    Windows
                  >>>    > >    wrapper: it has
                  downloaded and is running its first 1-hour task. If
                  that
                  >>>    > >    completes successfully
                  (it seems to have reached the
                  >>>    fully-operational stage),
                  >>>    > >    I'll try a full 24-hour
                  task, which under current operational
                  >>>    circumstances
                  >>>    > >    should generate a >4
                  GB file locally.
                  >>>    > >
                  >>>    > >
                  >>>    > >        On Sunday, 10 May
                  2015, 18:28, David Anderson
                  >>>    <da...@ssl.berkeley.edu
                  <mailto:da...@ssl.berkeley.edu>
                  >>>    > >    <mailto:da...@ssl.berkeley.edu
                  <mailto:da...@ssl.berkeley.edu>>>
                  wrote:
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  >>>    > >    NTFS handles > 4GB
                  files, even if the hardware and/or OS is only 32-bit.
                  >>>    > >    32-bit versions of
                  Windows have APIs (like _stat64()) for handling >
                  >>>    4GB files.
                  >>>    > >    BOINC needs to use these;
                  we fixed one place where it wasn't.
                  >>>    > >
                  >>>    > >    On Unix (Linux and Mac),
                  BOINC uses the regular APIs (like lseek())
                  >>>    but is
                  >>>    > >    built with a
                  >>>    > >    -D_FILE_OFFSET_BITS=64
                  flag that causes these functions to 64-bit size.
                  >>>    > >    However, it's possible
                  that BOINC has bugs involving > 4GB files on
                  >>>    Unix too.
                  >>>    > >    If anyone has a 32-bit
                  Linux system, please test with the CMS project.
                  >>>    > >
                  >>>    > >    -- David
                  >>>    > >
                  >>>    > >    On 10-May-2015 3:58 AM,
                  --SekeRob wrote:
                  >>>    > >    >
                  >>>    > >    > Just wondering, with
                  files over 4GB and a 64 bit lib introduced, is
                  >>>    it not a CMS
                  >>>    > >    > project requirement
                  to run on a 64 bit OS?
                  >>>    > >    >
                  >>>    > >    >
                  >>>    > >
                  >>>    > >
                  _______________________________________________
                  >>>    > >    boinc_alpha mailing list
                  >>>    > > boinc_al...@ssl.berkeley.edu
                  <mailto:boinc_al...@ssl.berkeley.edu>
                  >>>    <mailto:boinc_al...@ssl.berkeley.edu
                  <mailto:boinc_al...@ssl.berkeley.edu>>
                  >>>    > > 
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
                  >>>    > >    To unsubscribe, visit the
                  above URL and
                  >>>    > >    (near bottom of page)
                  enter your email address.
                  >>>
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  >>>    > >
                  _______________________________________________
                  >>>    > >    boinc_alpha mailing list
                  >>>    > > boinc_al...@ssl.berkeley.edu
                  <mailto:boinc_al...@ssl.berkeley.edu>
                  >>>    <mailto:boinc_al...@ssl.berkeley.edu
                  <mailto:boinc_al...@ssl.berkeley.edu>>
                  >>>    > > 
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
                  >>>    > >    To unsubscribe, visit the
                  above URL and
                  >>>    > >    (near bottom of page)
                  enter your email address.
                  >>>    > >
                  >>>    > >
                  >>>    >
                  >>>    >
                  _______________________________________________
                  >>>    > boinc_alpha mailing list
                  >>>    > boinc_al...@ssl.berkeley.edu
                  <mailto:boinc_al...@ssl.berkeley.edu>
                  >>>    > 
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
                  >>>    > To unsubscribe, visit the above
                  URL and
                  >>>    > (near bottom of page) enter your
                  email address.
                  >>>
                  >>>   
                  _______________________________________________
                  >>>    boinc_alpha mailing list
                  >>>    boinc_al...@ssl.berkeley.edu
                  <mailto:boinc_al...@ssl.berkeley.edu>
                  >>>    
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
                  >>>    To unsubscribe, visit the above URL
                  and
                  >>>    (near bottom of page) enter your email
                  address.
                  >>>
                  >>>
                  >>
                  >
                  >
                  >
                  >
------------------------------------------------------------------------------------
                  > Avast logo <http://www.avast.com/>     
                  >
                  > This email has been checked for viruses by Avast
                  antivirus software.
                  > www.avast.com <http://www.avast.com/>
                  >
                  >
                  
                  _______________________________________________
                  boinc_dev mailing list
                  boinc_dev@ssl.berkeley.edu
                  http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
                  
                    To unsubscribe, visit the above URL and
                    (near bottom of page) enter your email address.
                  
                  

                  

                
              
            
          
        
      
    
    
                                          
_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to