Re: Shindig rpc calls to itself

Merrill, Matt Fri, 05 Sep 2014 11:57:13 -0700

Ok thanks, I’m probably reiterating more about the problem itself because
it took me so long to figure out, ha :)


I’ve rendered gadgets and watched internal/external calls with the logging
I’ve instrumented, but I don’t have the same instrumentation in the old
version of the code we have.  I could put that in, though we are about to
pull the plug on this effort for a while (other priorities) and continue
using Shindig 2.0.0.

Our container is almost exactly the same with the exception of the
BlobEncrypter stuff that changed between Shindig 2.0.0 and 2.5.0.  We are
also now retrieving the container js this way:
{shindig host}/gmodules/gadgets/js/container?c=1&container=ourContainerName

Instead of this way:
{shindig host}/gmodules/gadgets/js/rpc.js?container=ourContainerName&c=1

Which as best i can tell in the wiki is the way we should be doing it now.

-Matt

On 9/5/14, 1:34 PM, "Ryan Baxter" <rbaxte...@gmail.com> wrote:

>I understand the issue I am just trying to understand the root cause like
>you ;)
>
>I guess what I am really wondering is if you have done any analysis on
>any additional calls being made between 2.0 and 2.5.0-update1?  Have
>you taken the same gadget rendered it using 2.0 and observed how many
>requests go to /rpc and then done the same thing with 2.5.0-update1.
>It would nice to know if there are really more traffic going to the
>servlet and if so where it is coming from.  I think we need to work
>backwards by identifying the root of the additional requests Shindig
>is making to itself to understand the cause of the problem.  I
>personally can't think of anything that would cause this off the top
>of my head and I can't think of any way to stop them from happening.
>
>What about the container code you are using to render the gadget?  Are
>you using the common container in 2.5.0-update1 or is the container
>code the same and the only difference is the server code?
>
>On Fri, Sep 5, 2014 at 12:48 PM, Merrill, Matt <mmerr...@mitre.org> wrote:
>> Yes, I added logging for every call out the door and the majority of the
>> calls which are holding up incoming threads are making http calls back
>>to
>> shindig itself, most notably the /rpc servlet endpoint.
>>
>> Basically, because shindig makes a loopback http call to itself and
>>that’s
>> on the same HTTP threadpool, the threadpool is starting to get exhausted
>> and there’s a cascading failure.  However, that threadpool is the same
>> size as we had it on Shindig 2.0.0, so I really can’t explain the
>> difference.
>>
>> I’m really wondering of any differences between 2.0.0 and 2.5.0-update1
>> that might cause additional HTTP calls to be made and whether you can
>> configure or code shindig not to make these loopbacks.
>>
>> -Matt
>>
>> On 9/5/14, 12:42 PM, "Ryan Baxter" <rbaxte...@apache.org> wrote:
>>
>>>So have you looked at what resources the fetcher is fetching?
>>>
>>>On Fri, Sep 5, 2014 at 12:17 PM, Merrill, Matt <mmerr...@mitre.org>
>>>wrote:
>>>> Yes, we have.  During a couple of the outages we did a thread dump and
>>>>saw
>>>> that all (or almost all) of the threads were blocking on the
>>>> BasicHTTPFetcher fetch method. We also saw the number of threads jump
>>>>up
>>>> to around the same number of threads we have in our Tomcat HTTP thread
>>>> pool (300).
>>>>
>>>> From the best I can tell, it seems as though the issue is that there
>>>>are
>>>> now MORE calls to the various shindig servlets being made which is
>>>>causing
>>>> all of the HTTP threads to get consumed, but we can’t explain why as
>>>>the
>>>> load is the same. Once we roll back to the version of the application
>>>> which uses shindig 2.0.0, everything is absolutely fine.
>>>>
>>>> I’m very hesitant to just increase the thread pool without a good
>>>> understanding of what could cause this.  If someone knows something
>>>>that
>>>> changed between the 2.0.0 and 2.5.0-update1 versions that may have
>>>>caused
>>>> more calls to be made whether through the opensocial java API or
>>>> internally inside shindig that would be great to know.
>>>>
>>>> Or, perhaps a configuration parameter was introduced that we have set
>>>> wrong that may have caused all these extra calls?
>>>>
>>>> We have already made sure our HTTP responses are cached at a very high
>>>> level per your excellent advice. However, because the majority of the
>>>> calls which seem to be taking a long time are RPC calls, it doesn’t
>>>>appear
>>>> these get cached anyway so that wouldn’t affect this problem.
>>>>
>>>> And if someone knows the answers to the configuration/extension
>>>>questions
>>>> about pipelining, that would be great.
>>>>
>>>> Thanks!
>>>>
>>>> -Matt
>>>>
>>>> On 9/5/14, 11:35 AM, "Ryan Baxter" <rbaxte...@apache.org> wrote:
>>>>
>>>>>So Matt have you looking into what those threads are doing?  I agree
>>>>>that it seems odd that with 2.5.1-update1 you are running out of
>>>>>threads but it is hard to pinpoint the reason without knowing what all
>>>>>those extra threads might be doing.
>>>>>
>>>>>
>>>>>On Thu, Sep 4, 2014 at 11:04 AM, Merrill, Matt <mmerr...@mitre.org>
>>>>>wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I haven’t heard back on this, so I thought I’d provide some more
>>>>>> information in the hopes that perhaps someone has some ideas as to
>>>>>>what
>>>>>> could be causing the issues we’re seeing with shindig’s “loopback”
>>>>>>http
>>>>>> calls.
>>>>>>
>>>>>> We have a situation where under load we hit a deadlock-like
>>>>>>situation
>>>>>> because of the HTTP calls shindig makes to itself when pipelining
>>>>>>gadget
>>>>>> data. Basically, the HTTP request threadpools inside our Shindig
>>>>>>Tomcat
>>>>>> container are getting maxed out, and when shindig makes an http rpc
>>>>>>call
>>>>>> to itself to render a gadget which pipelines data, the request gets
>>>>>>held
>>>>>> up waiting for the rpc call, which might be being blocked by the
>>>>>>Tomcat
>>>>>> container waiting to handle an HTTP request.  This only happens
>>>>>>under
>>>>>> load, of course.
>>>>>>
>>>>>> This is puzzling to me because when we were running Shindig 2.0.0,
>>>>>>we
>>>>>>had
>>>>>> the same size threadpool, and now that we’ve upgraded to Shindig
>>>>>> 2.5.0-update1, the threadpools now seem to be getting maxed out.  I
>>>>>>took
>>>>>> some timings inside of our various shindig SPI implementions
>>>>>> (PersonService, AppData Service) and I didn’t see anything alarming.
>>>>>> There are also no spikes in user traffic.
>>>>>>
>>>>>> As I see it, we have a few options I could explore:
>>>>>>
>>>>>> 1) The “nuclear” option would be to simply increase our tomcat HTTP
>>>>>> threadpools, but that doesn’t seem prudent since the old version of
>>>>>> shindig worked just fine with that thread pool setting.  I feel
>>>>>>like a
>>>>>> greater problem is being masked. Is there anything that changed
>>>>>>between
>>>>>> Shindig 2.0.0 and 2.5.0-update1 that could have caused some kind of
>>>>>> increase in traffic to shindig?  I tried looking at release notes in
>>>>>>Jira,
>>>>>> but that honestly wasn’t very helpful at all.
>>>>>>
>>>>>> 2) Re-configure Shindig to use implemented SPI methods (java method
>>>>>>calls)
>>>>>> instead of making HTTP calls to itself through the RPC API shindig
>>>>>> exposes?  Based on Stanton’s note below, it seems like there are
>>>>>>some
>>>>>> configuration options for the RPC calls, but they’re mostly related
>>>>>>to
>>>>>>how
>>>>>> the client-side javascript makes the calls.  Is there anything
>>>>>>server
>>>>>>side
>>>>>> I can configure?  Perhaps with Guice modules?
>>>>>>
>>>>>> 3) Explore would be if there are hooks in the code to write custom
>>>>>>code
>>>>>>to
>>>>>> do this. I see in PipelinedDataPreloader.executeSocialRequest that
>>>>>>the
>>>>>> javadoc mentions that:
>>>>>> "Subclasses can override to provide special handling (e.g., directly
>>>>>> invoking a local API)”  However, I’m missing something because I
>>>>>>can’t
>>>>>> find out where the preloader gets instantiated.  I see that the
>>>>>> PipelineExecutor takes in a Guice injected instance of
>>>>>> PipedlinedDataPreloader, however, I don’t see it getting created
>>>>>>anywhere.
>>>>>>  Where is this being configured?
>>>>>
>>>>>The intention was probably to make this possible via Guice, but there
>>>>>is not interface you can bind an implementation to.  You would have to
>>>>>replace the classes where PipelinesDataPreloader are used and then
>>>>>keep going up the chain until you get to a class where you can inject
>>>>>something via Guice.  Looks like a messy situation right now with the
>>>>>current way the code is written.
>>>>>
>>>>>>
>>>>>> Any help is appreciated!
>>>>>>
>>>>>> Thanks!
>>>>>> -Matt
>>>>>>
>>>>>> On 8/25/14, 4:55 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote:
>>>>>>
>>>>>>>Thanks Stanton!
>>>>>>>
>>>>>>>I¹m assuming that you mean the javascript calls will call
>>>>>>>listmethods
>>>>>>>then
>>>>>>>make any necessary RPC calls, is that correct?  Is there any other
>>>>>>>documentation on the introspection part?
>>>>>>>
>>>>>>>The reason I ask is that we¹re having problems server side when
>>>>>>>Shindig
>>>>>>>is
>>>>>>>pipelining data.  For example, when you do the following in a
>>>>>>>gadget:
>>>>>>><os:ViewerRequest key="viewer" />
>>>>>>>    <os:DataRequest key="appData" method="appdata.get"
>>>>>>>userId="@viewer"
>>>>>>>appId="@app"/>
>>>>>>>
>>>>>>>
>>>>>>>Shindig appears to make HTTP requests to the rpc endpoint to itself
>>>>>>>in
>>>>>>>the
>>>>>>>process of rendering the gadget.  I could be missing something
>>>>>>>fundamental
>>>>>>>here, but is there any way to configure this differently so that
>>>>>>>shindig
>>>>>>>simply uses its SPI methods to retrieve this data instead?  Is this
>>>>>>>really
>>>>>>>just more of a convenience for the gadget developer than anything
>>>>>>>else?
>>>>>>>
>>>>>>>-Matt
>>>>>>>
>>>>>>>On 8/20/14, 4:14 PM, "Stanton Sievers" <ssiev...@apache.org> wrote:
>>>>>>>
>>>>>>>>Hi Matt,
>>>>>>>>
>>>>>>>>This behavior is configured in container.js in the
>>>>>>>>"gadgets.features"
>>>>>>>>object.  If you look for "osapi" and "osapi.services", you'll see
>>>>>>>>some
>>>>>>>>comments about this configuration and the behavior.
>>>>>>>>features/container/service.js is where this configuration is used
>>>>>>>>and
>>>>>>>>where
>>>>>>>>the osapi services are instantiated.  As you've seen, Shindig
>>>>>>>>introspects
>>>>>>>>to find available services by default.
>>>>>>>>
>>>>>>>>If I knew at one point why this behaves this way, I've since
>>>>>>>>forgotten.
>>>>>>>>There is a system.listMethods API[1] defined in the Core API Server
>>>>>>>>spec
>>>>>>>>that this might simply be re-using to discover the available
>>>>>>>>services.
>>>>>>>>
>>>>>>>>I hope that helps.
>>>>>>>>
>>>>>>>>-Stanton
>>>>>>>>
>>>>>>>>[1]
>>>>>>>>http://opensocial.github.io/spec/trunk/Core-API-Server.xml#System-S
>>>>>>>>er
>>>>>>>>vi
>>>>>>>>ce
>>>>>>>>-
>>>>>>>>ListMethods
>>>>>>>>
>>>>>>>>
>>>>>>>>On Tue, Aug 19, 2014 at 8:13 AM, Merrill, Matt <mmerr...@mitre.org>
>>>>>>>>wrote:
>>>>>>>>
>>>>>>>>> Good morning,
>>>>>>>>>
>>>>>>>>> I¹m hoping some shindig veterans can help shed some light into
>>>>>>>>>the
>>>>>>>>>reason
>>>>>>>>> that Shindig makes HTTP rpc calls to itself as part of the gadget
>>>>>>>>>rendering
>>>>>>>>> process.  Why is this done as opposed to retrieving information
>>>>>>>>>via
>>>>>>>>> internal Java method calls?  We hare having lots of issues where
>>>>>>>>>this
>>>>>>>>> approach seems to be causing a cascading failure when calls get
>>>>>>>>>hung
>>>>>>>>>up
>>>>>>>>>in
>>>>>>>>> the HTTPFetcher class.
>>>>>>>>>
>>>>>>>>> Also, I¹m curious what calls are made in this manner and how can
>>>>>>>>>they
>>>>>>>>>be
>>>>>>>>> configured?  I have seen retrieval of viewer data done this way,
>>>>>>>>>as
>>>>>>>>>well as
>>>>>>>>> application data.
>>>>>>>>>
>>>>>>>>> I¹ve looked for documentation on this topic before and have not
>>>>>>>>>seen
>>>>>>>>>any.
>>>>>>>>>  Any help is much appreciated.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> -Matt Merrill
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Re: Shindig rpc calls to itself

Reply via email to