I understand the issue I am just trying to understand the root cause like you ;)

I guess what I am really wondering is if you have done any analysis on
any additional calls being made between 2.0 and 2.5.0-update1?  Have
you taken the same gadget rendered it using 2.0 and observed how many
requests go to /rpc and then done the same thing with 2.5.0-update1.
It would nice to know if there are really more traffic going to the
servlet and if so where it is coming from.  I think we need to work
backwards by identifying the root of the additional requests Shindig
is making to itself to understand the cause of the problem.  I
personally can't think of anything that would cause this off the top
of my head and I can't think of any way to stop them from happening.

What about the container code you are using to render the gadget?  Are
you using the common container in 2.5.0-update1 or is the container
code the same and the only difference is the server code?

On Fri, Sep 5, 2014 at 12:48 PM, Merrill, Matt <mmerr...@mitre.org> wrote:
> Yes, I added logging for every call out the door and the majority of the
> calls which are holding up incoming threads are making http calls back to
> shindig itself, most notably the /rpc servlet endpoint.
>
> Basically, because shindig makes a loopback http call to itself and that’s
> on the same HTTP threadpool, the threadpool is starting to get exhausted
> and there’s a cascading failure.  However, that threadpool is the same
> size as we had it on Shindig 2.0.0, so I really can’t explain the
> difference.
>
> I’m really wondering of any differences between 2.0.0 and 2.5.0-update1
> that might cause additional HTTP calls to be made and whether you can
> configure or code shindig not to make these loopbacks.
>
> -Matt
>
> On 9/5/14, 12:42 PM, "Ryan Baxter" <rbaxte...@apache.org> wrote:
>
>>So have you looked at what resources the fetcher is fetching?
>>
>>On Fri, Sep 5, 2014 at 12:17 PM, Merrill, Matt <mmerr...@mitre.org> wrote:
>>> Yes, we have.  During a couple of the outages we did a thread dump and
>>>saw
>>> that all (or almost all) of the threads were blocking on the
>>> BasicHTTPFetcher fetch method. We also saw the number of threads jump up
>>> to around the same number of threads we have in our Tomcat HTTP thread
>>> pool (300).
>>>
>>> From the best I can tell, it seems as though the issue is that there are
>>> now MORE calls to the various shindig servlets being made which is
>>>causing
>>> all of the HTTP threads to get consumed, but we can’t explain why as the
>>> load is the same. Once we roll back to the version of the application
>>> which uses shindig 2.0.0, everything is absolutely fine.
>>>
>>> I’m very hesitant to just increase the thread pool without a good
>>> understanding of what could cause this.  If someone knows something that
>>> changed between the 2.0.0 and 2.5.0-update1 versions that may have
>>>caused
>>> more calls to be made whether through the opensocial java API or
>>> internally inside shindig that would be great to know.
>>>
>>> Or, perhaps a configuration parameter was introduced that we have set
>>> wrong that may have caused all these extra calls?
>>>
>>> We have already made sure our HTTP responses are cached at a very high
>>> level per your excellent advice. However, because the majority of the
>>> calls which seem to be taking a long time are RPC calls, it doesn’t
>>>appear
>>> these get cached anyway so that wouldn’t affect this problem.
>>>
>>> And if someone knows the answers to the configuration/extension
>>>questions
>>> about pipelining, that would be great.
>>>
>>> Thanks!
>>>
>>> -Matt
>>>
>>> On 9/5/14, 11:35 AM, "Ryan Baxter" <rbaxte...@apache.org> wrote:
>>>
>>>>So Matt have you looking into what those threads are doing?  I agree
>>>>that it seems odd that with 2.5.1-update1 you are running out of
>>>>threads but it is hard to pinpoint the reason without knowing what all
>>>>those extra threads might be doing.
>>>>
>>>>
>>>>On Thu, Sep 4, 2014 at 11:04 AM, Merrill, Matt <mmerr...@mitre.org>
>>>>wrote:
>>>>> Hi all,
>>>>>
>>>>> I haven’t heard back on this, so I thought I’d provide some more
>>>>> information in the hopes that perhaps someone has some ideas as to
>>>>>what
>>>>> could be causing the issues we’re seeing with shindig’s “loopback”
>>>>>http
>>>>> calls.
>>>>>
>>>>> We have a situation where under load we hit a deadlock-like situation
>>>>> because of the HTTP calls shindig makes to itself when pipelining
>>>>>gadget
>>>>> data. Basically, the HTTP request threadpools inside our Shindig
>>>>>Tomcat
>>>>> container are getting maxed out, and when shindig makes an http rpc
>>>>>call
>>>>> to itself to render a gadget which pipelines data, the request gets
>>>>>held
>>>>> up waiting for the rpc call, which might be being blocked by the
>>>>>Tomcat
>>>>> container waiting to handle an HTTP request.  This only happens under
>>>>> load, of course.
>>>>>
>>>>> This is puzzling to me because when we were running Shindig 2.0.0, we
>>>>>had
>>>>> the same size threadpool, and now that we’ve upgraded to Shindig
>>>>> 2.5.0-update1, the threadpools now seem to be getting maxed out.  I
>>>>>took
>>>>> some timings inside of our various shindig SPI implementions
>>>>> (PersonService, AppData Service) and I didn’t see anything alarming.
>>>>> There are also no spikes in user traffic.
>>>>>
>>>>> As I see it, we have a few options I could explore:
>>>>>
>>>>> 1) The “nuclear” option would be to simply increase our tomcat HTTP
>>>>> threadpools, but that doesn’t seem prudent since the old version of
>>>>> shindig worked just fine with that thread pool setting.  I feel like a
>>>>> greater problem is being masked. Is there anything that changed
>>>>>between
>>>>> Shindig 2.0.0 and 2.5.0-update1 that could have caused some kind of
>>>>> increase in traffic to shindig?  I tried looking at release notes in
>>>>>Jira,
>>>>> but that honestly wasn’t very helpful at all.
>>>>>
>>>>> 2) Re-configure Shindig to use implemented SPI methods (java method
>>>>>calls)
>>>>> instead of making HTTP calls to itself through the RPC API shindig
>>>>> exposes?  Based on Stanton’s note below, it seems like there are some
>>>>> configuration options for the RPC calls, but they’re mostly related to
>>>>>how
>>>>> the client-side javascript makes the calls.  Is there anything server
>>>>>side
>>>>> I can configure?  Perhaps with Guice modules?
>>>>>
>>>>> 3) Explore would be if there are hooks in the code to write custom
>>>>>code
>>>>>to
>>>>> do this. I see in PipelinedDataPreloader.executeSocialRequest that the
>>>>> javadoc mentions that:
>>>>> "Subclasses can override to provide special handling (e.g., directly
>>>>> invoking a local API)”  However, I’m missing something because I can’t
>>>>> find out where the preloader gets instantiated.  I see that the
>>>>> PipelineExecutor takes in a Guice injected instance of
>>>>> PipedlinedDataPreloader, however, I don’t see it getting created
>>>>>anywhere.
>>>>>  Where is this being configured?
>>>>
>>>>The intention was probably to make this possible via Guice, but there
>>>>is not interface you can bind an implementation to.  You would have to
>>>>replace the classes where PipelinesDataPreloader are used and then
>>>>keep going up the chain until you get to a class where you can inject
>>>>something via Guice.  Looks like a messy situation right now with the
>>>>current way the code is written.
>>>>
>>>>>
>>>>> Any help is appreciated!
>>>>>
>>>>> Thanks!
>>>>> -Matt
>>>>>
>>>>> On 8/25/14, 4:55 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote:
>>>>>
>>>>>>Thanks Stanton!
>>>>>>
>>>>>>I¹m assuming that you mean the javascript calls will call listmethods
>>>>>>then
>>>>>>make any necessary RPC calls, is that correct?  Is there any other
>>>>>>documentation on the introspection part?
>>>>>>
>>>>>>The reason I ask is that we¹re having problems server side when
>>>>>>Shindig
>>>>>>is
>>>>>>pipelining data.  For example, when you do the following in a gadget:
>>>>>><os:ViewerRequest key="viewer" />
>>>>>>    <os:DataRequest key="appData" method="appdata.get"
>>>>>>userId="@viewer"
>>>>>>appId="@app"/>
>>>>>>
>>>>>>
>>>>>>Shindig appears to make HTTP requests to the rpc endpoint to itself in
>>>>>>the
>>>>>>process of rendering the gadget.  I could be missing something
>>>>>>fundamental
>>>>>>here, but is there any way to configure this differently so that
>>>>>>shindig
>>>>>>simply uses its SPI methods to retrieve this data instead?  Is this
>>>>>>really
>>>>>>just more of a convenience for the gadget developer than anything
>>>>>>else?
>>>>>>
>>>>>>-Matt
>>>>>>
>>>>>>On 8/20/14, 4:14 PM, "Stanton Sievers" <ssiev...@apache.org> wrote:
>>>>>>
>>>>>>>Hi Matt,
>>>>>>>
>>>>>>>This behavior is configured in container.js in the "gadgets.features"
>>>>>>>object.  If you look for "osapi" and "osapi.services", you'll see
>>>>>>>some
>>>>>>>comments about this configuration and the behavior.
>>>>>>>features/container/service.js is where this configuration is used and
>>>>>>>where
>>>>>>>the osapi services are instantiated.  As you've seen, Shindig
>>>>>>>introspects
>>>>>>>to find available services by default.
>>>>>>>
>>>>>>>If I knew at one point why this behaves this way, I've since
>>>>>>>forgotten.
>>>>>>>There is a system.listMethods API[1] defined in the Core API Server
>>>>>>>spec
>>>>>>>that this might simply be re-using to discover the available
>>>>>>>services.
>>>>>>>
>>>>>>>I hope that helps.
>>>>>>>
>>>>>>>-Stanton
>>>>>>>
>>>>>>>[1]
>>>>>>>http://opensocial.github.io/spec/trunk/Core-API-Server.xml#System-Ser
>>>>>>>vi
>>>>>>>ce
>>>>>>>-
>>>>>>>ListMethods
>>>>>>>
>>>>>>>
>>>>>>>On Tue, Aug 19, 2014 at 8:13 AM, Merrill, Matt <mmerr...@mitre.org>
>>>>>>>wrote:
>>>>>>>
>>>>>>>> Good morning,
>>>>>>>>
>>>>>>>> I¹m hoping some shindig veterans can help shed some light into the
>>>>>>>>reason
>>>>>>>> that Shindig makes HTTP rpc calls to itself as part of the gadget
>>>>>>>>rendering
>>>>>>>> process.  Why is this done as opposed to retrieving information via
>>>>>>>> internal Java method calls?  We hare having lots of issues where
>>>>>>>>this
>>>>>>>> approach seems to be causing a cascading failure when calls get
>>>>>>>>hung
>>>>>>>>up
>>>>>>>>in
>>>>>>>> the HTTPFetcher class.
>>>>>>>>
>>>>>>>> Also, I¹m curious what calls are made in this manner and how can
>>>>>>>>they
>>>>>>>>be
>>>>>>>> configured?  I have seen retrieval of viewer data done this way, as
>>>>>>>>well as
>>>>>>>> application data.
>>>>>>>>
>>>>>>>> I¹ve looked for documentation on this topic before and have not
>>>>>>>>seen
>>>>>>>>any.
>>>>>>>>  Any help is much appreciated.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> -Matt Merrill
>>>>>>>>
>>>>>>
>>>>>
>>>
>

Reply via email to