Rom:

I'm testing the 26032 on RNAWorld and I just got this:
> 2013-11-09 08:23:12 (4106): vboxwrapper: starting
> 2013-11-09 08:23:12 (4106): Feature: Enabling trickle-ups (Interval:
> 14400.000000)
> 2013-11-09 08:23:12 (4106): Detected: VirtualBox 4.2.16_Debianr86992
> 2013-11-09 08:23:16 (4106): Restore from previously saved snapshot.
> 2013-11-09 08:23:18 (4106): Restore completed.
> 2013-11-09 08:23:18 (4106): Starting VM.
> 2013-11-09 08:23:31 (4106): Successfully started VM.
> 2013-11-09 08:23:31 (4106): Setting cpu throttle for VM. (100%)
> 2013-11-09 08:23:31 (4106): Setting network throttle for VM.
> 2013-11-09 08:38:26 (4106): Status Report: Job Duration: '0.000000',
> Elapsed Time: '10207.734515'
> 2013-11-09 09:47:41 (4106): Status Report: Trickle-Up Event.
> 2013-11-09 09:47:41 (4106): Sending Trickle-Up Event failed (-191).
-191 means ERR_NO_OPTION which leads us back to a problem 5 months ago (
http://boinc.berkeley.edu/trac/changeset/655fd5e429442f574124b12ff0396d9df4d42d2f/boinc-v2/samples/vboxwrapper)
which was fixed than.

You broke this with commit:
http://boinc.berkeley.edu/trac/changeset/4820cb5436cc98c15c39fac58e2220a1db8a8cc4/boinc-v2/samples/vboxwrapper
which puts the boinc_init_options() call before setting
boinc_options.handle_trickle_ups = true; again. So we are never able to
send trickle messages.

I would propose to change the init block at line 407 like this:
>     memset(&boinc_options, 0, sizeof(boinc_options));
>     boinc_options.main_program = true;
>     boinc_options.check_heartbeat = true;
>     boinc_options.handle_process_control = true;
>     if (trickle_period > 0.0) {
>         boinc_options.handle_trickle_ups = true;
>     }
>     boinc_init_options(&boinc_options);

So we still get a printf message after starting but set the correct
option before calling boinc_init_options().

Regards
Christian


Am 08.11.2013 05:14, schrieb Rom Walton:
>
> Christian,
>
>  
>
> 26032 is up on http://boinc.berkeley.edu/dl/.
>
>  
>
> It includes:
>
> *VBOX: Add logging in case of a trickle-up failure.*
>
> *VBOX: Adjust the VM process priority right before a suspend command
> to speed up how quickly the VM is suspended.*
>
> * *
>
> *----- Rom*
>
>  
>
> *From:*Christian Beer [mailto:[email protected]]
> *Sent:* Thursday, November 07, 2013 6:45 AM
> *To:* Rom Walton
> *Cc:* BOINCDev Mailing List; David Anderson (BOINC); Lammert van der Veen
> *Subject:* Re: ongoing problems with vboxwrapper
>
>  
>
> Hi Rom,
>
> can you please move the new trickle status printf down after
> boinc_send_trickle_up() and check the return value of this? This is
> more helpful in case the trickle didn't get send and the reason is in
> the log. I will deploy this new version on RNAWorld and create some
> long running tasks then.
>
> @David: Is there an easy way for me to assign this results to specific
> users? I want to focus on users that I can contact in our forum to
> check on the stderr.txt during runtime.
>
> Regards
> Christian
>
> Am 07.11.2013 00:53, schrieb Rom Walton:
>
>     I've posted 26031 to http://boinc.berkeley.edu/dl/.
>
>      
>
>     It contains the following changes:
>
>     VBOX: Use the same technique for calculating when to report a
>     trickle as we use for performing checkpoints.
>
>     VBOX: Add a trickle-up status report entry to stderr.txt every
>     time we send a trickle event.
>
>     VBOX: Add VirtualBox 4.3.0 to bad builds list.
>
>     VBOX: We only need to filter the vboxmanage output in one place.
>
>     VBOX: Add additional check to determine if the get VM log command
>     really failed.
>
>      
>
>     ----- Rom
>
>      
>
>     *From:*Christian Beer [mailto:[email protected]]
>     *Sent:* Wednesday, November 06, 2013 11:29 AM
>     *To:* Rom Walton
>     *Cc:* BOINCDev Mailing List; David Anderson (BOINC); Lammert van
>     der Veen
>     *Subject:* Re: ongoing problems with vboxwrapper
>
>      
>
>     Possible explanation. The current control script is buggy and does
>     not redirect the scientific app's stdout and stderr to files so it
>     ends up in the VM log. But this is happening on all tasks not just
>     on some.
>
>     Regards
>     Christian
>
>     Am 06.11.2013 17:24, schrieb Rom Walton:
>
>         Ah, so we are tripping up on the new code to check for
>         EXIT_OUT_OF_MEMORY. Fun fun fun.
>
>          
>
>         Okay, I'll commit a change for this.
>
>          
>
>         I'm not sure why vboxmanage would be returning a non-zero exit
>         status in this situation.
>
>          
>
>         ----- Rom
>
>          
>
>         *From:*Christian Beer [mailto:[email protected]]
>         *Sent:* Wednesday, November 06, 2013 11:11 AM
>         *To:* Rom Walton
>         *Cc:* BOINCDev Mailing List; David Anderson (BOINC); Lammert
>         van der Veen
>         *Subject:* Re: ongoing problems with vboxwrapper
>
>          
>
>         It seems that the general scheme seems to be this (see the
>         User's post:
>         
> https://www.rechenkraft.net/forum/viewtopic.php?f=76&t=13059&start=180#p143176):
>
>
>
>         2013-10-17 11:21:40 (816): Creating new snapshot for VM.
>         2013-10-17 11:21:48 (816): Deleting stale snapshot.
>         2013-10-17 11:21:49 (816): Checkpoint completed.
>         2013-10-17 11:26:42 (816): Error in get vm log for VM: 3
>         Arguments:
>         showvminfo "boinc_fb76a72fc6655131" --log 0 
>         Output:
>         VirtualBox VM 4.2.16 r86992 win.amd64 (Jul  4 2013 15:51:44)
>         release log
>         00:00:00.042649 Log opened 2013-10-16T18:39:34.043413700Z
>
>         followed by the actual VM Log (rather long) and this over and
>         over:
>
>
>
>         05:44:49.794013 ********************* End of CFGM dump
>         **********************
>         05:44:50.074438 Changing the VM state from 'SUSPENDED' to
>         'RESUMING'.
>         05:44:50.074495 Changing the VM state from 'RESUMING' to
>         'RUNNING'.
>         05:44:55.318224 Changing the VM state from 'RUNNING' to
>         'SUSPENDING'.
>         05:44:55.774983 PDMR3Suspend: 456 736 124 ns run time
>         05:44:55.775003 Changing the VM state from 'SUSPENDING' to
>         'SUSPENDED'.
>         05:44:55.775847 DrvBlock: Flushes will be ignored
>         05:44:55.775855 DrvBlock: Async flushes will be passed to the disk
>         05:44:55.776116 VD: Opening the disk took 236410 ns
>         05:44:55.776131 PIIX3 ATA: LUN#0: disk, PCHS=4161/16/63, total
>         number of sectors 4194304
>         05:44:55.776139 ************************* CFGM dump
>         *************************
>         05:44:55.776140 [/Devices/piix3ide/0/] (level 0)
>         05:44:55.776142   PCIBusNo      <integer> = 0x0000000000000000 (0)
>         05:44:55.776145   PCIDeviceNo   <integer> = 0x0000000000000001 (1)
>         05:44:55.776146   PCIFunctionNo <integer> = 0x0000000000000001 (1)
>         05:44:55.776147   Trusted       <integer> = 0x0000000000000001 (1)
>         05:44:55.776148 
>         05:44:55.776149 [/Devices/piix3ide/0/Config/] (level 1)
>         (restricted root)
>         05:44:55.776151   Type <string>  = "PIIX4" (cb=6)
>         05:44:55.776152 
>         05:44:55.776153 [/Devices/piix3ide/0/Config/PrimaryMaster/]
>         (level 2)
>         05:44:55.776155   NonRotationalMedium <integer> =
>         0x0000000000000000 (0)
>         05:44:55.776156 
>         05:44:55.776156 [/Devices/piix3ide/0/LUN#0/] (level 1)
>         05:44:55.776158   Driver <string>  = "Block" (cb=6)
>         05:44:55.776159 
>         05:44:55.776159 [/Devices/piix3ide/0/LUN#0/AttachedDriver/]
>         (level 2)
>         05:44:55.776161   Driver <string>  = "VD" (cb=3)
>         05:44:55.776162 
>         05:44:55.776163
>         [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/] (level 3)
>         (restricted root)
>         05:44:55.776165   Format     <string>  = "VDI" (cb=4)
>         05:44:55.776166   Path       <string>  =
>         
> "D:\ProgramData\BOINC\slots\9\boinc_fb76a72fc6655131\Snapshots\{650bac36-f84e-43c7-b30c-c8a078244a51}.vdi"
>         (cb=105)
>         05:44:55.776167   SetupMerge <integer> = 0x0000000000000001 (1)
>         05:44:55.776168   Type       <string>  = "HardDisk" (cb=9)
>         05:44:55.776169 
>         05:44:55.776170
>         [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/Parent/]
>         (level 4)
>         05:44:55.776172   Format      <string>  = "VDI" (cb=4)
>         05:44:55.776173   MergeSource <integer> = 0x0000000000000001 (1)
>         05:44:55.776174   Path        <string>  =
>         
> "D:\ProgramData\BOINC\slots\9\boinc_fb76a72fc6655131\Snapshots\{ed98fbf7-ab54-4f4c-97d9-ec954d59d419}.vdi"
>         (cb=105)
>         05:44:55.776176 
>         05:44:55.776176
>         [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/Parent/Parent/]
>         (level 5)
>         05:44:55.776178   Format      <string>  = "VDI" (cb=4)
>         05:44:55.776179   MergeTarget <integer> = 0x0000000000000001 (1)
>         05:44:55.776180   Path        <string>  =
>         "D:\ProgramData\BOINC\slots\9\vm_image.vdi" (cb=42)
>         05:44:55.776182 
>         05:44:55.776182 [/Devices/piix3ide/0/LUN#0/Config/] (level 2)
>         (restricted root)
>         05:44:55.776184   Mountable <integer> = 0x0000000000000000 (0)
>         05:44:55.776185   Type      <string>  = "HardDisk" (cb=9)
>         05:44:55.776186 
>         05:44:55.776187 [/Devices/piix3ide/0/LUN#999/] (level 1)
>         05:44:55.776188   Driver <string>  = "MainStatus" (cb=11)
>         05:44:55.776189 
>         05:44:55.776190 [/Devices/piix3ide/0/LUN#999/Config/] (level
>         2) (restricted root)
>         05:44:55.776192   DeviceInstance        <string>  =
>         "piix3ide/0" (cb=11)
>         05:44:55.776193   First                 <integer> =
>         0x0000000000000000 (0)
>         05:44:55.776194   Last                  <integer> =
>         0x0000000000000003 (3)
>         05:44:55.776196   pConsole              <integer> =
>         0x0000000001cc8280 (30 179 968)
>         05:44:55.776198   papLeds               <integer> =
>         0x0000000001cc8598 (30 180 760)
>         05:44:55.776200   pmapMediumAttachments <integer> =
>         0x0000000001cc88a0 (30 181 536)
>         05:44:55.776201 
>         05:44:55.776202 ********************* End of CFGM dump
>         **********************
>         05:44:55.776213 Changing the VM state from 'SUSPENDED' to
>         'RESUMING'.
>
>         Another user reported that after a host restart the growth of
>         stderr.txt was normal again.
>
>         Regards
>         Christian
>
>         Am 06.11.2013 16:59, schrieb Rom Walton:
>
>             Lammert, what did you discover?
>
>              
>
>             Christian, do you happen to know what kind of messages
>             stderr.txt was filled with?
>
>              
>
>             Vboxwrapper uses wall clock time internally.  I'll see
>             what I can find about the trickle messages.
>
>              
>
>             ----- Rom
>
>              
>
>             *From:*Christian Beer [mailto:[email protected]]
>             *Sent:* Wednesday, November 06, 2013 10:30 AM
>             *To:* BOINCDev Mailing List
>             *Cc:* Rom Walton; David Anderson (BOINC); Lammert van der Veen
>             *Subject:* ongoing problems with vboxwrapper
>
>              
>
>             Hello,
>
>             we are running the 26028 version of the vboxwrapper for
>             some time now and I want to update you on some ongoing
>             problems.
>
>             Some users reported that the stderr.txt is filled with
>             lots of error messages and file size increases to several
>             GB. The file was truncated by the user and I didn't see
>             any unusual disk_size_limit_reached errors. So either this
>             was an isolated incident or the file size doesn't matter.
>
>             Many users reported that the VM is still running after the
>             BOINC Client was shut down. Lammert van der Veen did some
>             research to the cause and I hope this can be fixed by
>             limiting one concurrent VM per Host and the 26031 wrapper
>             as soon as I upgrade our application.
>
>             Trickle messages were working fine when running with short
>             tasks. Now that we have some longer tasks the trickle up
>             messages stopped. We didn't receive any in over a month. I
>             think I have an explanation for this:
>             In the vboxwrapper the trickles are generated every X
>             seconds cpu_time and not wall_clock_time and as the
>             vboxwrapper is not doing much the cpu_time increases very
>             slowly. What I want is a trickle message every X hours of
>             VM runtime! Please look into this asap because without
>             this feature I have to monitor the deadlines and extend
>             them by hand.
>             If it may be helpful:
>             A recent long running task reported back with
>             cpu_time=793517.4 and elapsed_time=839064.946827 but I
>             also have a task with cpu_time=4280.246 and
>             elapsed_time=378984.324896 I can't see any trickle
>             messages for both of them.
>             first:
>             http://www.rnaworld.de/rnaworld/result.php?resultid=14920843
>             (Job Duration is always 0, Elapsed time is increasing)
>             second:
>             http://www.rnaworld.de/rnaworld/result.php?resultid=14921349
>             (can't see anything in stderr)
>
>             Speaking of deadlines, it would also be great to update
>             the deadline on the client. I know of one user who updates
>             his client_state.xml by hand to prevent the Client from
>             going into high priority mode for RNAWorld when there is
>             no need.
>
>             Regards
>             Christian
>
>          
>
>      
>
>  
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to