Hi Rom,

can you please move the new trickle status printf down after
boinc_send_trickle_up() and check the return value of this? This is more
helpful in case the trickle didn't get send and the reason is in the
log. I will deploy this new version on RNAWorld and create some long
running tasks then.

@David: Is there an easy way for me to assign this results to specific
users? I want to focus on users that I can contact in our forum to check
on the stderr.txt during runtime.

Regards
Christian

Am 07.11.2013 00:53, schrieb Rom Walton:
>
> I've posted 26031 to http://boinc.berkeley.edu/dl/.
>
>  
>
> It contains the following changes:
>
> VBOX: Use the same technique for calculating when to report a trickle
> as we use for performing checkpoints.
>
> VBOX: Add a trickle-up status report entry to stderr.txt every time we
> send a trickle event.
>
> VBOX: Add VirtualBox 4.3.0 to bad builds list.
>
> VBOX: We only need to filter the vboxmanage output in one place.
>
> VBOX: Add additional check to determine if the get VM log command
> really failed.
>
>  
>
> ----- Rom
>
>  
>
> *From:*Christian Beer [mailto:[email protected]]
> *Sent:* Wednesday, November 06, 2013 11:29 AM
> *To:* Rom Walton
> *Cc:* BOINCDev Mailing List; David Anderson (BOINC); Lammert van der Veen
> *Subject:* Re: ongoing problems with vboxwrapper
>
>  
>
> Possible explanation. The current control script is buggy and does not
> redirect the scientific app's stdout and stderr to files so it ends up
> in the VM log. But this is happening on all tasks not just on some.
>
> Regards
> Christian
>
> Am 06.11.2013 17:24, schrieb Rom Walton:
>
>     Ah, so we are tripping up on the new code to check for
>     EXIT_OUT_OF_MEMORY. Fun fun fun.
>
>      
>
>     Okay, I'll commit a change for this.
>
>      
>
>     I'm not sure why vboxmanage would be returning a non-zero exit
>     status in this situation.
>
>      
>
>     ----- Rom
>
>      
>
>     *From:*Christian Beer [mailto:[email protected]]
>     *Sent:* Wednesday, November 06, 2013 11:11 AM
>     *To:* Rom Walton
>     *Cc:* BOINCDev Mailing List; David Anderson (BOINC); Lammert van
>     der Veen
>     *Subject:* Re: ongoing problems with vboxwrapper
>
>      
>
>     It seems that the general scheme seems to be this (see the User's
>     post:
>     
> https://www.rechenkraft.net/forum/viewtopic.php?f=76&t=13059&start=180#p143176):
>
>
>     2013-10-17 11:21:40 (816): Creating new snapshot for VM.
>     2013-10-17 11:21:48 (816): Deleting stale snapshot.
>     2013-10-17 11:21:49 (816): Checkpoint completed.
>     2013-10-17 11:26:42 (816): Error in get vm log for VM: 3
>     Arguments:
>     showvminfo "boinc_fb76a72fc6655131" --log 0 
>     Output:
>     VirtualBox VM 4.2.16 r86992 win.amd64 (Jul  4 2013 15:51:44)
>     release log
>     00:00:00.042649 Log opened 2013-10-16T18:39:34.043413700Z
>
>     followed by the actual VM Log (rather long) and this over and over:
>
>
>     05:44:49.794013 ********************* End of CFGM dump
>     **********************
>     05:44:50.074438 Changing the VM state from 'SUSPENDED' to 'RESUMING'.
>     05:44:50.074495 Changing the VM state from 'RESUMING' to 'RUNNING'.
>     05:44:55.318224 Changing the VM state from 'RUNNING' to 'SUSPENDING'.
>     05:44:55.774983 PDMR3Suspend: 456 736 124 ns run time
>     05:44:55.775003 Changing the VM state from 'SUSPENDING' to
>     'SUSPENDED'.
>     05:44:55.775847 DrvBlock: Flushes will be ignored
>     05:44:55.775855 DrvBlock: Async flushes will be passed to the disk
>     05:44:55.776116 VD: Opening the disk took 236410 ns
>     05:44:55.776131 PIIX3 ATA: LUN#0: disk, PCHS=4161/16/63, total
>     number of sectors 4194304
>     05:44:55.776139 ************************* CFGM dump
>     *************************
>     05:44:55.776140 [/Devices/piix3ide/0/] (level 0)
>     05:44:55.776142   PCIBusNo      <integer> = 0x0000000000000000 (0)
>     05:44:55.776145   PCIDeviceNo   <integer> = 0x0000000000000001 (1)
>     05:44:55.776146   PCIFunctionNo <integer> = 0x0000000000000001 (1)
>     05:44:55.776147   Trusted       <integer> = 0x0000000000000001 (1)
>     05:44:55.776148 
>     05:44:55.776149 [/Devices/piix3ide/0/Config/] (level 1)
>     (restricted root)
>     05:44:55.776151   Type <string>  = "PIIX4" (cb=6)
>     05:44:55.776152 
>     05:44:55.776153 [/Devices/piix3ide/0/Config/PrimaryMaster/] (level 2)
>     05:44:55.776155   NonRotationalMedium <integer> =
>     0x0000000000000000 (0)
>     05:44:55.776156 
>     05:44:55.776156 [/Devices/piix3ide/0/LUN#0/] (level 1)
>     05:44:55.776158   Driver <string>  = "Block" (cb=6)
>     05:44:55.776159 
>     05:44:55.776159 [/Devices/piix3ide/0/LUN#0/AttachedDriver/] (level 2)
>     05:44:55.776161   Driver <string>  = "VD" (cb=3)
>     05:44:55.776162 
>     05:44:55.776163 [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/]
>     (level 3) (restricted root)
>     05:44:55.776165   Format     <string>  = "VDI" (cb=4)
>     05:44:55.776166   Path       <string>  =
>     
> "D:\ProgramData\BOINC\slots\9\boinc_fb76a72fc6655131\Snapshots\{650bac36-f84e-43c7-b30c-c8a078244a51}.vdi"
>     (cb=105)
>     05:44:55.776167   SetupMerge <integer> = 0x0000000000000001 (1)
>     05:44:55.776168   Type       <string>  = "HardDisk" (cb=9)
>     05:44:55.776169 
>     05:44:55.776170
>     [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/Parent/] (level 4)
>     05:44:55.776172   Format      <string>  = "VDI" (cb=4)
>     05:44:55.776173   MergeSource <integer> = 0x0000000000000001 (1)
>     05:44:55.776174   Path        <string>  =
>     
> "D:\ProgramData\BOINC\slots\9\boinc_fb76a72fc6655131\Snapshots\{ed98fbf7-ab54-4f4c-97d9-ec954d59d419}.vdi"
>     (cb=105)
>     05:44:55.776176 
>     05:44:55.776176
>     [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/Parent/Parent/]
>     (level 5)
>     05:44:55.776178   Format      <string>  = "VDI" (cb=4)
>     05:44:55.776179   MergeTarget <integer> = 0x0000000000000001 (1)
>     05:44:55.776180   Path        <string>  =
>     "D:\ProgramData\BOINC\slots\9\vm_image.vdi" (cb=42)
>     05:44:55.776182 
>     05:44:55.776182 [/Devices/piix3ide/0/LUN#0/Config/] (level 2)
>     (restricted root)
>     05:44:55.776184   Mountable <integer> = 0x0000000000000000 (0)
>     05:44:55.776185   Type      <string>  = "HardDisk" (cb=9)
>     05:44:55.776186 
>     05:44:55.776187 [/Devices/piix3ide/0/LUN#999/] (level 1)
>     05:44:55.776188   Driver <string>  = "MainStatus" (cb=11)
>     05:44:55.776189 
>     05:44:55.776190 [/Devices/piix3ide/0/LUN#999/Config/] (level 2)
>     (restricted root)
>     05:44:55.776192   DeviceInstance        <string>  = "piix3ide/0"
>     (cb=11)
>     05:44:55.776193   First                 <integer> =
>     0x0000000000000000 (0)
>     05:44:55.776194   Last                  <integer> =
>     0x0000000000000003 (3)
>     05:44:55.776196   pConsole              <integer> =
>     0x0000000001cc8280 (30 179 968)
>     05:44:55.776198   papLeds               <integer> =
>     0x0000000001cc8598 (30 180 760)
>     05:44:55.776200   pmapMediumAttachments <integer> =
>     0x0000000001cc88a0 (30 181 536)
>     05:44:55.776201 
>     05:44:55.776202 ********************* End of CFGM dump
>     **********************
>     05:44:55.776213 Changing the VM state from 'SUSPENDED' to 'RESUMING'.
>
>     Another user reported that after a host restart the growth of
>     stderr.txt was normal again.
>
>     Regards
>     Christian
>
>     Am 06.11.2013 16:59, schrieb Rom Walton:
>
>         Lammert, what did you discover?
>
>          
>
>         Christian, do you happen to know what kind of messages
>         stderr.txt was filled with?
>
>          
>
>         Vboxwrapper uses wall clock time internally.  I'll see what I
>         can find about the trickle messages.
>
>          
>
>         ----- Rom
>
>          
>
>         *From:*Christian Beer [mailto:[email protected]]
>         *Sent:* Wednesday, November 06, 2013 10:30 AM
>         *To:* BOINCDev Mailing List
>         *Cc:* Rom Walton; David Anderson (BOINC); Lammert van der Veen
>         *Subject:* ongoing problems with vboxwrapper
>
>          
>
>         Hello,
>
>         we are running the 26028 version of the vboxwrapper for some
>         time now and I want to update you on some ongoing problems.
>
>         Some users reported that the stderr.txt is filled with lots of
>         error messages and file size increases to several GB. The file
>         was truncated by the user and I didn't see any unusual
>         disk_size_limit_reached errors. So either this was an isolated
>         incident or the file size doesn't matter.
>
>         Many users reported that the VM is still running after the
>         BOINC Client was shut down. Lammert van der Veen did some
>         research to the cause and I hope this can be fixed by limiting
>         one concurrent VM per Host and the 26031 wrapper as soon as I
>         upgrade our application.
>
>         Trickle messages were working fine when running with short
>         tasks. Now that we have some longer tasks the trickle up
>         messages stopped. We didn't receive any in over a month. I
>         think I have an explanation for this:
>         In the vboxwrapper the trickles are generated every X seconds
>         cpu_time and not wall_clock_time and as the vboxwrapper is not
>         doing much the cpu_time increases very slowly. What I want is
>         a trickle message every X hours of VM runtime! Please look
>         into this asap because without this feature I have to monitor
>         the deadlines and extend them by hand.
>         If it may be helpful:
>         A recent long running task reported back with
>         cpu_time=793517.4 and elapsed_time=839064.946827 but I also
>         have a task with cpu_time=4280.246 and
>         elapsed_time=378984.324896 I can't see any trickle messages
>         for both of them.
>         first:
>         http://www.rnaworld.de/rnaworld/result.php?resultid=14920843
>         (Job Duration is always 0, Elapsed time is increasing)
>         second:
>         http://www.rnaworld.de/rnaworld/result.php?resultid=14921349
>         (can't see anything in stderr)
>
>         Speaking of deadlines, it would also be great to update the
>         deadline on the client. I know of one user who updates his
>         client_state.xml by hand to prevent the Client from going into
>         high priority mode for RNAWorld when there is no need.
>
>         Regards
>         Christian
>
>      
>
>  
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to