I understand the issue I am just trying to understand the root cause like you ;)
I guess what I am really wondering is if you have done any analysis on any additional calls being made between 2.0 and 2.5.0-update1? Have you taken the same gadget rendered it using 2.0 and observed how many requests go to /rpc and then done the same thing with 2.5.0-update1. It would nice to know if there are really more traffic going to the servlet and if so where it is coming from. I think we need to work backwards by identifying the root of the additional requests Shindig is making to itself to understand the cause of the problem. I personally can't think of anything that would cause this off the top of my head and I can't think of any way to stop them from happening. What about the container code you are using to render the gadget? Are you using the common container in 2.5.0-update1 or is the container code the same and the only difference is the server code? On Fri, Sep 5, 2014 at 12:48 PM, Merrill, Matt <mmerr...@mitre.org> wrote: > Yes, I added logging for every call out the door and the majority of the > calls which are holding up incoming threads are making http calls back to > shindig itself, most notably the /rpc servlet endpoint. > > Basically, because shindig makes a loopback http call to itself and that’s > on the same HTTP threadpool, the threadpool is starting to get exhausted > and there’s a cascading failure. However, that threadpool is the same > size as we had it on Shindig 2.0.0, so I really can’t explain the > difference. > > I’m really wondering of any differences between 2.0.0 and 2.5.0-update1 > that might cause additional HTTP calls to be made and whether you can > configure or code shindig not to make these loopbacks. > > -Matt > > On 9/5/14, 12:42 PM, "Ryan Baxter" <rbaxte...@apache.org> wrote: > >>So have you looked at what resources the fetcher is fetching? >> >>On Fri, Sep 5, 2014 at 12:17 PM, Merrill, Matt <mmerr...@mitre.org> wrote: >>> Yes, we have. During a couple of the outages we did a thread dump and >>>saw >>> that all (or almost all) of the threads were blocking on the >>> BasicHTTPFetcher fetch method. We also saw the number of threads jump up >>> to around the same number of threads we have in our Tomcat HTTP thread >>> pool (300). >>> >>> From the best I can tell, it seems as though the issue is that there are >>> now MORE calls to the various shindig servlets being made which is >>>causing >>> all of the HTTP threads to get consumed, but we can’t explain why as the >>> load is the same. Once we roll back to the version of the application >>> which uses shindig 2.0.0, everything is absolutely fine. >>> >>> I’m very hesitant to just increase the thread pool without a good >>> understanding of what could cause this. If someone knows something that >>> changed between the 2.0.0 and 2.5.0-update1 versions that may have >>>caused >>> more calls to be made whether through the opensocial java API or >>> internally inside shindig that would be great to know. >>> >>> Or, perhaps a configuration parameter was introduced that we have set >>> wrong that may have caused all these extra calls? >>> >>> We have already made sure our HTTP responses are cached at a very high >>> level per your excellent advice. However, because the majority of the >>> calls which seem to be taking a long time are RPC calls, it doesn’t >>>appear >>> these get cached anyway so that wouldn’t affect this problem. >>> >>> And if someone knows the answers to the configuration/extension >>>questions >>> about pipelining, that would be great. >>> >>> Thanks! >>> >>> -Matt >>> >>> On 9/5/14, 11:35 AM, "Ryan Baxter" <rbaxte...@apache.org> wrote: >>> >>>>So Matt have you looking into what those threads are doing? I agree >>>>that it seems odd that with 2.5.1-update1 you are running out of >>>>threads but it is hard to pinpoint the reason without knowing what all >>>>those extra threads might be doing. >>>> >>>> >>>>On Thu, Sep 4, 2014 at 11:04 AM, Merrill, Matt <mmerr...@mitre.org> >>>>wrote: >>>>> Hi all, >>>>> >>>>> I haven’t heard back on this, so I thought I’d provide some more >>>>> information in the hopes that perhaps someone has some ideas as to >>>>>what >>>>> could be causing the issues we’re seeing with shindig’s “loopback” >>>>>http >>>>> calls. >>>>> >>>>> We have a situation where under load we hit a deadlock-like situation >>>>> because of the HTTP calls shindig makes to itself when pipelining >>>>>gadget >>>>> data. Basically, the HTTP request threadpools inside our Shindig >>>>>Tomcat >>>>> container are getting maxed out, and when shindig makes an http rpc >>>>>call >>>>> to itself to render a gadget which pipelines data, the request gets >>>>>held >>>>> up waiting for the rpc call, which might be being blocked by the >>>>>Tomcat >>>>> container waiting to handle an HTTP request. This only happens under >>>>> load, of course. >>>>> >>>>> This is puzzling to me because when we were running Shindig 2.0.0, we >>>>>had >>>>> the same size threadpool, and now that we’ve upgraded to Shindig >>>>> 2.5.0-update1, the threadpools now seem to be getting maxed out. I >>>>>took >>>>> some timings inside of our various shindig SPI implementions >>>>> (PersonService, AppData Service) and I didn’t see anything alarming. >>>>> There are also no spikes in user traffic. >>>>> >>>>> As I see it, we have a few options I could explore: >>>>> >>>>> 1) The “nuclear” option would be to simply increase our tomcat HTTP >>>>> threadpools, but that doesn’t seem prudent since the old version of >>>>> shindig worked just fine with that thread pool setting. I feel like a >>>>> greater problem is being masked. Is there anything that changed >>>>>between >>>>> Shindig 2.0.0 and 2.5.0-update1 that could have caused some kind of >>>>> increase in traffic to shindig? I tried looking at release notes in >>>>>Jira, >>>>> but that honestly wasn’t very helpful at all. >>>>> >>>>> 2) Re-configure Shindig to use implemented SPI methods (java method >>>>>calls) >>>>> instead of making HTTP calls to itself through the RPC API shindig >>>>> exposes? Based on Stanton’s note below, it seems like there are some >>>>> configuration options for the RPC calls, but they’re mostly related to >>>>>how >>>>> the client-side javascript makes the calls. Is there anything server >>>>>side >>>>> I can configure? Perhaps with Guice modules? >>>>> >>>>> 3) Explore would be if there are hooks in the code to write custom >>>>>code >>>>>to >>>>> do this. I see in PipelinedDataPreloader.executeSocialRequest that the >>>>> javadoc mentions that: >>>>> "Subclasses can override to provide special handling (e.g., directly >>>>> invoking a local API)” However, I’m missing something because I can’t >>>>> find out where the preloader gets instantiated. I see that the >>>>> PipelineExecutor takes in a Guice injected instance of >>>>> PipedlinedDataPreloader, however, I don’t see it getting created >>>>>anywhere. >>>>> Where is this being configured? >>>> >>>>The intention was probably to make this possible via Guice, but there >>>>is not interface you can bind an implementation to. You would have to >>>>replace the classes where PipelinesDataPreloader are used and then >>>>keep going up the chain until you get to a class where you can inject >>>>something via Guice. Looks like a messy situation right now with the >>>>current way the code is written. >>>> >>>>> >>>>> Any help is appreciated! >>>>> >>>>> Thanks! >>>>> -Matt >>>>> >>>>> On 8/25/14, 4:55 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote: >>>>> >>>>>>Thanks Stanton! >>>>>> >>>>>>I¹m assuming that you mean the javascript calls will call listmethods >>>>>>then >>>>>>make any necessary RPC calls, is that correct? Is there any other >>>>>>documentation on the introspection part? >>>>>> >>>>>>The reason I ask is that we¹re having problems server side when >>>>>>Shindig >>>>>>is >>>>>>pipelining data. For example, when you do the following in a gadget: >>>>>><os:ViewerRequest key="viewer" /> >>>>>> <os:DataRequest key="appData" method="appdata.get" >>>>>>userId="@viewer" >>>>>>appId="@app"/> >>>>>> >>>>>> >>>>>>Shindig appears to make HTTP requests to the rpc endpoint to itself in >>>>>>the >>>>>>process of rendering the gadget. I could be missing something >>>>>>fundamental >>>>>>here, but is there any way to configure this differently so that >>>>>>shindig >>>>>>simply uses its SPI methods to retrieve this data instead? Is this >>>>>>really >>>>>>just more of a convenience for the gadget developer than anything >>>>>>else? >>>>>> >>>>>>-Matt >>>>>> >>>>>>On 8/20/14, 4:14 PM, "Stanton Sievers" <ssiev...@apache.org> wrote: >>>>>> >>>>>>>Hi Matt, >>>>>>> >>>>>>>This behavior is configured in container.js in the "gadgets.features" >>>>>>>object. If you look for "osapi" and "osapi.services", you'll see >>>>>>>some >>>>>>>comments about this configuration and the behavior. >>>>>>>features/container/service.js is where this configuration is used and >>>>>>>where >>>>>>>the osapi services are instantiated. As you've seen, Shindig >>>>>>>introspects >>>>>>>to find available services by default. >>>>>>> >>>>>>>If I knew at one point why this behaves this way, I've since >>>>>>>forgotten. >>>>>>>There is a system.listMethods API[1] defined in the Core API Server >>>>>>>spec >>>>>>>that this might simply be re-using to discover the available >>>>>>>services. >>>>>>> >>>>>>>I hope that helps. >>>>>>> >>>>>>>-Stanton >>>>>>> >>>>>>>[1] >>>>>>>http://opensocial.github.io/spec/trunk/Core-API-Server.xml#System-Ser >>>>>>>vi >>>>>>>ce >>>>>>>- >>>>>>>ListMethods >>>>>>> >>>>>>> >>>>>>>On Tue, Aug 19, 2014 at 8:13 AM, Merrill, Matt <mmerr...@mitre.org> >>>>>>>wrote: >>>>>>> >>>>>>>> Good morning, >>>>>>>> >>>>>>>> I¹m hoping some shindig veterans can help shed some light into the >>>>>>>>reason >>>>>>>> that Shindig makes HTTP rpc calls to itself as part of the gadget >>>>>>>>rendering >>>>>>>> process. Why is this done as opposed to retrieving information via >>>>>>>> internal Java method calls? We hare having lots of issues where >>>>>>>>this >>>>>>>> approach seems to be causing a cascading failure when calls get >>>>>>>>hung >>>>>>>>up >>>>>>>>in >>>>>>>> the HTTPFetcher class. >>>>>>>> >>>>>>>> Also, I¹m curious what calls are made in this manner and how can >>>>>>>>they >>>>>>>>be >>>>>>>> configured? I have seen retrieval of viewer data done this way, as >>>>>>>>well as >>>>>>>> application data. >>>>>>>> >>>>>>>> I¹ve looked for documentation on this topic before and have not >>>>>>>>seen >>>>>>>>any. >>>>>>>> Any help is much appreciated. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> -Matt Merrill >>>>>>>> >>>>>> >>>>> >>> >