[ 
https://issues.apache.org/jira/browse/BROOKLYN-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576225#comment-15576225
 ] 

Aled Sage commented on BROOKLYN-364:
------------------------------------

Looking at the jclouds wire log (running {{grep -E "(GET|POST|DELETE) http|<< 
HTTP/1.1 200|Content-Length: " log/jclouds.debug.log}}):

{noformat}
2016-10-13 11:37:19,714 DEBUG 102 j.headers [ger-gaovbYIG-171] >> GET 
https://api.softlayer.com/rest/v3/SoftLayer_Location_Datacenter/Datacenters?objectMask=locationAddress%3Bregions
 HTTP/1.1
2016-10-13 11:37:20,961 DEBUG 102 j.headers [ger-gaovbYIG-171] << HTTP/1.1 200 
OK
2016-10-13 11:37:20,962 DEBUG 102 j.headers [ger-gaovbYIG-171] << 
Content-Length: 15710
2016-10-13 11:37:21,027 DEBUG 102 j.headers [ger-gaovbYIG-171] >> GET 
https://api.softlayer.com/rest/v3/SoftLayer_Virtual_Guest/getCreateObjectOptions
 HTTP/1.1
2016-10-13 11:37:24,043 DEBUG 102 j.headers [ger-gaovbYIG-171] << HTTP/1.1 200 
OK
2016-10-13 11:37:24,045 DEBUG 102 j.headers [ger-gaovbYIG-171] << 
Content-Length: 34882
2016-10-13 11:37:24,817 DEBUG 102 j.headers [ger-gaovbYIG-171] >> GET 
https://api.softlayer.com/rest/v3/SoftLayer_Software_Description/getAllObjects?objectMask=id%3Bname%3Bversion%3BoperatingSystem%3BlongDescription%3BreferenceCode
 HTTP/1.1
2016-10-13 11:37:34,640 DEBUG 102 j.headers [ger-gaovbYIG-171] << HTTP/1.1 200 
OK
2016-10-13 11:37:34,641 DEBUG 102 j.headers [ger-gaovbYIG-171] << 
Content-Length: 249182
2016-10-13 11:37:36,638 DEBUG 102 j.headers [ger-gaovbYIG-171] >> GET 
https://api.softlayer.com/rest/v3/SoftLayer_Account/VirtualGuests?objectMask=powerState%3BoperatingSystem.passwords%3Bdatacenter%3BbillingItem%3BblockDevices.diskImage%3BtagReferences
 HTTP/1.1
2016-10-13 11:37:42,250 DEBUG 102 j.headers [ger-gaovbYIG-171] << HTTP/1.1 200 
OK
2016-10-13 11:37:42,252 DEBUG 102 j.headers [ger-gaovbYIG-171] << 
Content-Length: 152977604

// Note the really long delay between the above command and the next one

2016-10-13 11:58:07,153 DEBUG 102 j.headers [ger-gaovbYIG-171] >> GET 
https://api.softlayer.com/rest/v3/SoftLayer_Location_Datacenter/Datacenters?objectMask=locationAddress%3Bregions
 HTTP/1.1
2016-10-13 11:58:08,135 DEBUG 102 j.headers [ger-gaovbYIG-171] << HTTP/1.1 200 
OK
2016-10-13 11:58:08,135 DEBUG 102 j.headers [ger-gaovbYIG-171] << 
Content-Length: 15710
2016-10-13 11:58:08,150 DEBUG 102 j.headers [ger-gaovbYIG-171] >> GET 
https://api.softlayer.com/rest/v3/SoftLayer_Virtual_Guest/getCreateObjectOptions
 HTTP/1.1
2016-10-13 11:58:10,281 DEBUG 102 j.headers [ger-gaovbYIG-171] << HTTP/1.1 200 
OK
2016-10-13 11:58:10,282 DEBUG 102 j.headers [ger-gaovbYIG-171] << 
Content-Length: 34882
...
{noformat}

It then hangs for a long time (before writing out the 153MB message that it 
just received!).

Looking in jstack at what the thread is doing (calling into jclouds 
{{createNodesInGroup}}), it is trying to log by copying the byte stream of the 
message.:

{noformat}
"brooklyn-execmanager-gaovbYIG-171" daemon prio=5 tid=0x00007fa9c6670000 
nid=0x121e3 runnable [0x000070000332b000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:152)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
        at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
        at sun.security.ssl.InputRecord.read(InputRecord.java:509)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
        - locked <0x0000000784768d50> (a java.lang.Object)
        at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884)
        at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
        - locked <0x0000000784768e00> (a sun.security.ssl.AppInputStream)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        - locked <0x000000078476ae28> (a java.io.BufferedInputStream)
        at sun.net.www.MeteredStream.read(MeteredStream.java:134)
        - locked <0x00000007847c6c48> (a sun.net.www.MeteredStream)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at 
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3053)
        at 
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3047)
        at com.google.common.io.ByteStreams.copy(ByteStreams.java:175)
        at org.jclouds.logging.internal.Wire.copy(Wire.java:93)
        at org.jclouds.logging.internal.Wire.input(Wire.java:119)
        at org.jclouds.logging.internal.Wire.input(Wire.java:126)
        at 
org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:101)
        at 
org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:90)
        at 
org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:73)
        at 
org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:44)
        at 
org.jclouds.reflect.FunctionalReflection$FunctionalInvocationHandler.handleInvocation(FunctionalReflection.java:117)
        at 
com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
        at com.sun.proxy.$Proxy175.listVirtualGuests(Unknown Source)
        at 
org.jclouds.softlayer.compute.strategy.SoftLayerComputeServiceAdapter.listNodes(SoftLayerComputeServiceAdapter.java:345)
        at 
org.jclouds.compute.strategy.impl.AdaptingComputeServiceStrategies.listDetailsOnNodesMatching(AdaptingComputeServiceStrategies.java:122)
        at 
org.jclouds.compute.strategy.impl.AdaptingComputeServiceStrategies.listNodes(AdaptingComputeServiceStrategies.java:112)
        at 
org.jclouds.compute.strategy.impl.CreateNodesWithGroupEncodedIntoNameThenAddToSet.getNextNames(CreateNodesWithGroupEncodedIntoNameThenAddToSet.java:192)
        at 
org.jclouds.compute.strategy.impl.CreateNodesWithGroupEncodedIntoNameThenAddToSet.execute(CreateNodesWithGroupEncodedIntoNameThenAddToSet.java:121)
        at 
org.jclouds.compute.internal.BaseComputeService.createNodesInGroup(BaseComputeService.java:210)
        at 
org.apache.brooklyn.location.jclouds.JcloudsLocation.obtainOnce(JcloudsLocation.java:741)
        at 
org.apache.brooklyn.location.jclouds.JcloudsLocation.obtain(JcloudsLocation.java:629)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ObtainLocationTask.call(MachineLifecycleEffectorTasks.java:459)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ObtainLocationTask.call(MachineLifecycleEffectorTasks.java:449)
        at 
org.apache.brooklyn.util.core.task.Tasks.withBlockingDetails(Tasks.java:106)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisionMachineTask.call(MachineLifecycleEffectorTasks.java:431)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisionMachineTask.call(MachineLifecycleEffectorTasks.java:407)
        at 
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
        at 
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

Once that finishes, the memory usage increases a lot. According to {{jmap 
-histo:live <pid>}}, the main culprit is char arrays. It increases from 66MB to 
peak at 939MB (with 1074MB of total memory used), before going back down again. 
e.g.:

{noformat}
jmap -histo:live $PID | grep -E "1:.*\[C" && jmap -histo:live $PID | tail -1

   1:        203312      633096888  [C
Total       1958471      766581424
{noformat}

In more detail, we see the char array bytes used going from 66MB -> 104MB -> 
323MB -> 625MB -> 939MB -> 461MB -> 633MB -> 327MB -> 21MB.

Looking at the corresponding jstack output at the times when we transition to 
bigger memory usage, it seems to be correlated to a switch to a different stage 
in the logging. For example, jclouds {{Wire.input}} copying the http response 
payload to a file-backed output stream, then reading that into a StringBuilder, 
then calling buffer.toString, then calling slf4j which calls log4j which does 
another buffer.toString.

---
The huge message looks like a bug on Softlayer's side; I've reported it in 
http://stackoverflow.com/questions/40050395/softlayer-api-call-to-virtualguests-returns-huge-duplicated-response

The response contains 394 copies of the details of each VM!

We can reproduce this huge response with a simple curl command.

---
This happens with a Brooklyn build from a couple of months ago - it is not a 
regression on our side.

---
Turning the wire log off helps (we no longer peak at 939MB of char arrays; 
instead we peak at 564MB (because it is still holding multiple copies in-memory 
while it parses the json).

In Karaf, you can turn the wire log off by editing 
{{etc/org.ops4j.pax.logging.cfg}} to set {{log4j.logger.jclouds.wire=INFO, 
jclouds}} (instead of it saying "DEBUG").

---
Andrea has already found a workaround for jclouds - see 
https://github.com/jclouds/jclouds/pull/1020.


> OutOfMemoryError deploying to Softlayer
> ---------------------------------------
>
>                 Key: BROOKLYN-364
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-364
>             Project: Brooklyn
>          Issue Type: Bug
>            Reporter: Aled Sage
>            Priority: Critical
>
> When deploying to SoftLayer, one often gets {{OutOfMemoryError}} due to heap 
> space. This is more likely to happen with several concurrent apps, or with 
> large blueprints.
> The workaround is to (substantially) increase the allocated memory (first 
> checking that the VM can support that amount of memory!). For example, set 
> {{-Xmx=4G}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to