Possible explanation. The current control script is buggy and does not
redirect the scientific app's stdout and stderr to files so it ends up
in the VM log. But this is happening on all tasks not just on some.

Regards
Christian

Am 06.11.2013 17:24, schrieb Rom Walton:
>
> Ah, so we are tripping up on the new code to check for
> EXIT_OUT_OF_MEMORY. Fun fun fun.
>
>  
>
> Okay, I'll commit a change for this.
>
>  
>
> I'm not sure why vboxmanage would be returning a non-zero exit status
> in this situation.
>
>  
>
> ----- Rom
>
>  
>
> *From:*Christian Beer [mailto:[email protected]]
> *Sent:* Wednesday, November 06, 2013 11:11 AM
> *To:* Rom Walton
> *Cc:* BOINCDev Mailing List; David Anderson (BOINC); Lammert van der Veen
> *Subject:* Re: ongoing problems with vboxwrapper
>
>  
>
> It seems that the general scheme seems to be this (see the User's
> post:
> https://www.rechenkraft.net/forum/viewtopic.php?f=76&t=13059&start=180#p143176):
>
> 2013-10-17 11:21:40 (816): Creating new snapshot for VM.
> 2013-10-17 11:21:48 (816): Deleting stale snapshot.
> 2013-10-17 11:21:49 (816): Checkpoint completed.
> 2013-10-17 11:26:42 (816): Error in get vm log for VM: 3
> Arguments:
> showvminfo "boinc_fb76a72fc6655131" --log 0 
> Output:
> VirtualBox VM 4.2.16 r86992 win.amd64 (Jul  4 2013 15:51:44) release log
> 00:00:00.042649 Log opened 2013-10-16T18:39:34.043413700Z
>
> followed by the actual VM Log (rather long) and this over and over:
>
> 05:44:49.794013 ********************* End of CFGM dump
> **********************
> 05:44:50.074438 Changing the VM state from 'SUSPENDED' to 'RESUMING'.
> 05:44:50.074495 Changing the VM state from 'RESUMING' to 'RUNNING'.
> 05:44:55.318224 Changing the VM state from 'RUNNING' to 'SUSPENDING'.
> 05:44:55.774983 PDMR3Suspend: 456 736 124 ns run time
> 05:44:55.775003 Changing the VM state from 'SUSPENDING' to 'SUSPENDED'.
> 05:44:55.775847 DrvBlock: Flushes will be ignored
> 05:44:55.775855 DrvBlock: Async flushes will be passed to the disk
> 05:44:55.776116 VD: Opening the disk took 236410 ns
> 05:44:55.776131 PIIX3 ATA: LUN#0: disk, PCHS=4161/16/63, total number
> of sectors 4194304
> 05:44:55.776139 ************************* CFGM dump
> *************************
> 05:44:55.776140 [/Devices/piix3ide/0/] (level 0)
> 05:44:55.776142   PCIBusNo      <integer> = 0x0000000000000000 (0)
> 05:44:55.776145   PCIDeviceNo   <integer> = 0x0000000000000001 (1)
> 05:44:55.776146   PCIFunctionNo <integer> = 0x0000000000000001 (1)
> 05:44:55.776147   Trusted       <integer> = 0x0000000000000001 (1)
> 05:44:55.776148 
> 05:44:55.776149 [/Devices/piix3ide/0/Config/] (level 1) (restricted root)
> 05:44:55.776151   Type <string>  = "PIIX4" (cb=6)
> 05:44:55.776152 
> 05:44:55.776153 [/Devices/piix3ide/0/Config/PrimaryMaster/] (level 2)
> 05:44:55.776155   NonRotationalMedium <integer> = 0x0000000000000000 (0)
> 05:44:55.776156 
> 05:44:55.776156 [/Devices/piix3ide/0/LUN#0/] (level 1)
> 05:44:55.776158   Driver <string>  = "Block" (cb=6)
> 05:44:55.776159 
> 05:44:55.776159 [/Devices/piix3ide/0/LUN#0/AttachedDriver/] (level 2)
> 05:44:55.776161   Driver <string>  = "VD" (cb=3)
> 05:44:55.776162 
> 05:44:55.776163 [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/]
> (level 3) (restricted root)
> 05:44:55.776165   Format     <string>  = "VDI" (cb=4)
> 05:44:55.776166   Path       <string>  =
> "D:\ProgramData\BOINC\slots\9\boinc_fb76a72fc6655131\Snapshots\{650bac36-f84e-43c7-b30c-c8a078244a51}.vdi"
> (cb=105)
> 05:44:55.776167   SetupMerge <integer> = 0x0000000000000001 (1)
> 05:44:55.776168   Type       <string>  = "HardDisk" (cb=9)
> 05:44:55.776169 
> 05:44:55.776170
> [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/Parent/] (level 4)
> 05:44:55.776172   Format      <string>  = "VDI" (cb=4)
> 05:44:55.776173   MergeSource <integer> = 0x0000000000000001 (1)
> 05:44:55.776174   Path        <string>  =
> "D:\ProgramData\BOINC\slots\9\boinc_fb76a72fc6655131\Snapshots\{ed98fbf7-ab54-4f4c-97d9-ec954d59d419}.vdi"
> (cb=105)
> 05:44:55.776176 
> 05:44:55.776176
> [/Devices/piix3ide/0/LUN#0/AttachedDriver/Config/Parent/Parent/] (level 5)
> 05:44:55.776178   Format      <string>  = "VDI" (cb=4)
> 05:44:55.776179   MergeTarget <integer> = 0x0000000000000001 (1)
> 05:44:55.776180   Path        <string>  =
> "D:\ProgramData\BOINC\slots\9\vm_image.vdi" (cb=42)
> 05:44:55.776182 
> 05:44:55.776182 [/Devices/piix3ide/0/LUN#0/Config/] (level 2)
> (restricted root)
> 05:44:55.776184   Mountable <integer> = 0x0000000000000000 (0)
> 05:44:55.776185   Type      <string>  = "HardDisk" (cb=9)
> 05:44:55.776186 
> 05:44:55.776187 [/Devices/piix3ide/0/LUN#999/] (level 1)
> 05:44:55.776188   Driver <string>  = "MainStatus" (cb=11)
> 05:44:55.776189 
> 05:44:55.776190 [/Devices/piix3ide/0/LUN#999/Config/] (level 2)
> (restricted root)
> 05:44:55.776192   DeviceInstance        <string>  = "piix3ide/0" (cb=11)
> 05:44:55.776193   First                 <integer> = 0x0000000000000000 (0)
> 05:44:55.776194   Last                  <integer> = 0x0000000000000003 (3)
> 05:44:55.776196   pConsole              <integer> = 0x0000000001cc8280
> (30 179 968)
> 05:44:55.776198   papLeds               <integer> = 0x0000000001cc8598
> (30 180 760)
> 05:44:55.776200   pmapMediumAttachments <integer> = 0x0000000001cc88a0
> (30 181 536)
> 05:44:55.776201 
> 05:44:55.776202 ********************* End of CFGM dump
> **********************
> 05:44:55.776213 Changing the VM state from 'SUSPENDED' to 'RESUMING'.
>
> Another user reported that after a host restart the growth of
> stderr.txt was normal again.
>
> Regards
> Christian
>
> Am 06.11.2013 16:59, schrieb Rom Walton:
>
>     Lammert, what did you discover?
>
>      
>
>     Christian, do you happen to know what kind of messages stderr.txt
>     was filled with?
>
>      
>
>     Vboxwrapper uses wall clock time internally.  I'll see what I can
>     find about the trickle messages.
>
>      
>
>     ----- Rom
>
>      
>
>     *From:*Christian Beer [mailto:[email protected]]
>     *Sent:* Wednesday, November 06, 2013 10:30 AM
>     *To:* BOINCDev Mailing List
>     *Cc:* Rom Walton; David Anderson (BOINC); Lammert van der Veen
>     *Subject:* ongoing problems with vboxwrapper
>
>      
>
>     Hello,
>
>     we are running the 26028 version of the vboxwrapper for some time
>     now and I want to update you on some ongoing problems.
>
>     Some users reported that the stderr.txt is filled with lots of
>     error messages and file size increases to several GB. The file was
>     truncated by the user and I didn't see any unusual
>     disk_size_limit_reached errors. So either this was an isolated
>     incident or the file size doesn't matter.
>
>     Many users reported that the VM is still running after the BOINC
>     Client was shut down. Lammert van der Veen did some research to
>     the cause and I hope this can be fixed by limiting one concurrent
>     VM per Host and the 26031 wrapper as soon as I upgrade our
>     application.
>
>     Trickle messages were working fine when running with short tasks.
>     Now that we have some longer tasks the trickle up messages
>     stopped. We didn't receive any in over a month. I think I have an
>     explanation for this:
>     In the vboxwrapper the trickles are generated every X seconds
>     cpu_time and not wall_clock_time and as the vboxwrapper is not
>     doing much the cpu_time increases very slowly. What I want is a
>     trickle message every X hours of VM runtime! Please look into this
>     asap because without this feature I have to monitor the deadlines
>     and extend them by hand.
>     If it may be helpful:
>     A recent long running task reported back with cpu_time=793517.4
>     and elapsed_time=839064.946827 but I also have a task with
>     cpu_time=4280.246 and elapsed_time=378984.324896 I can't see any
>     trickle messages for both of them.
>     first:
>     http://www.rnaworld.de/rnaworld/result.php?resultid=14920843 (Job
>     Duration is always 0, Elapsed time is increasing)
>     second:
>     http://www.rnaworld.de/rnaworld/result.php?resultid=14921349
>     (can't see anything in stderr)
>
>     Speaking of deadlines, it would also be great to update the
>     deadline on the client. I know of one user who updates his
>     client_state.xml by hand to prevent the Client from going into
>     high priority mode for RNAWorld when there is no need.
>
>     Regards
>     Christian
>
>  
>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to