So have you looked at what resources the fetcher is fetching?

On Fri, Sep 5, 2014 at 12:17 PM, Merrill, Matt <mmerr...@mitre.org> wrote:
> Yes, we have.  During a couple of the outages we did a thread dump and saw
> that all (or almost all) of the threads were blocking on the
> BasicHTTPFetcher fetch method. We also saw the number of threads jump up
> to around the same number of threads we have in our Tomcat HTTP thread
> pool (300).
>
> From the best I can tell, it seems as though the issue is that there are
> now MORE calls to the various shindig servlets being made which is causing
> all of the HTTP threads to get consumed, but we can’t explain why as the
> load is the same. Once we roll back to the version of the application
> which uses shindig 2.0.0, everything is absolutely fine.
>
> I’m very hesitant to just increase the thread pool without a good
> understanding of what could cause this.  If someone knows something that
> changed between the 2.0.0 and 2.5.0-update1 versions that may have caused
> more calls to be made whether through the opensocial java API or
> internally inside shindig that would be great to know.
>
> Or, perhaps a configuration parameter was introduced that we have set
> wrong that may have caused all these extra calls?
>
> We have already made sure our HTTP responses are cached at a very high
> level per your excellent advice. However, because the majority of the
> calls which seem to be taking a long time are RPC calls, it doesn’t appear
> these get cached anyway so that wouldn’t affect this problem.
>
> And if someone knows the answers to the configuration/extension questions
> about pipelining, that would be great.
>
> Thanks!
>
> -Matt
>
> On 9/5/14, 11:35 AM, "Ryan Baxter" <rbaxte...@apache.org> wrote:
>
>>So Matt have you looking into what those threads are doing?  I agree
>>that it seems odd that with 2.5.1-update1 you are running out of
>>threads but it is hard to pinpoint the reason without knowing what all
>>those extra threads might be doing.
>>
>>
>>On Thu, Sep 4, 2014 at 11:04 AM, Merrill, Matt <mmerr...@mitre.org> wrote:
>>> Hi all,
>>>
>>> I haven’t heard back on this, so I thought I’d provide some more
>>> information in the hopes that perhaps someone has some ideas as to what
>>> could be causing the issues we’re seeing with shindig’s “loopback” http
>>> calls.
>>>
>>> We have a situation where under load we hit a deadlock-like situation
>>> because of the HTTP calls shindig makes to itself when pipelining gadget
>>> data. Basically, the HTTP request threadpools inside our Shindig Tomcat
>>> container are getting maxed out, and when shindig makes an http rpc call
>>> to itself to render a gadget which pipelines data, the request gets held
>>> up waiting for the rpc call, which might be being blocked by the Tomcat
>>> container waiting to handle an HTTP request.  This only happens under
>>> load, of course.
>>>
>>> This is puzzling to me because when we were running Shindig 2.0.0, we
>>>had
>>> the same size threadpool, and now that we’ve upgraded to Shindig
>>> 2.5.0-update1, the threadpools now seem to be getting maxed out.  I took
>>> some timings inside of our various shindig SPI implementions
>>> (PersonService, AppData Service) and I didn’t see anything alarming.
>>> There are also no spikes in user traffic.
>>>
>>> As I see it, we have a few options I could explore:
>>>
>>> 1) The “nuclear” option would be to simply increase our tomcat HTTP
>>> threadpools, but that doesn’t seem prudent since the old version of
>>> shindig worked just fine with that thread pool setting.  I feel like a
>>> greater problem is being masked. Is there anything that changed between
>>> Shindig 2.0.0 and 2.5.0-update1 that could have caused some kind of
>>> increase in traffic to shindig?  I tried looking at release notes in
>>>Jira,
>>> but that honestly wasn’t very helpful at all.
>>>
>>> 2) Re-configure Shindig to use implemented SPI methods (java method
>>>calls)
>>> instead of making HTTP calls to itself through the RPC API shindig
>>> exposes?  Based on Stanton’s note below, it seems like there are some
>>> configuration options for the RPC calls, but they’re mostly related to
>>>how
>>> the client-side javascript makes the calls.  Is there anything server
>>>side
>>> I can configure?  Perhaps with Guice modules?
>>>
>>> 3) Explore would be if there are hooks in the code to write custom code
>>>to
>>> do this. I see in PipelinedDataPreloader.executeSocialRequest that the
>>> javadoc mentions that:
>>> "Subclasses can override to provide special handling (e.g., directly
>>> invoking a local API)”  However, I’m missing something because I can’t
>>> find out where the preloader gets instantiated.  I see that the
>>> PipelineExecutor takes in a Guice injected instance of
>>> PipedlinedDataPreloader, however, I don’t see it getting created
>>>anywhere.
>>>  Where is this being configured?
>>
>>The intention was probably to make this possible via Guice, but there
>>is not interface you can bind an implementation to.  You would have to
>>replace the classes where PipelinesDataPreloader are used and then
>>keep going up the chain until you get to a class where you can inject
>>something via Guice.  Looks like a messy situation right now with the
>>current way the code is written.
>>
>>>
>>> Any help is appreciated!
>>>
>>> Thanks!
>>> -Matt
>>>
>>> On 8/25/14, 4:55 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote:
>>>
>>>>Thanks Stanton!
>>>>
>>>>I¹m assuming that you mean the javascript calls will call listmethods
>>>>then
>>>>make any necessary RPC calls, is that correct?  Is there any other
>>>>documentation on the introspection part?
>>>>
>>>>The reason I ask is that we¹re having problems server side when Shindig
>>>>is
>>>>pipelining data.  For example, when you do the following in a gadget:
>>>><os:ViewerRequest key="viewer" />
>>>>    <os:DataRequest key="appData" method="appdata.get" userId="@viewer"
>>>>appId="@app"/>
>>>>
>>>>
>>>>Shindig appears to make HTTP requests to the rpc endpoint to itself in
>>>>the
>>>>process of rendering the gadget.  I could be missing something
>>>>fundamental
>>>>here, but is there any way to configure this differently so that shindig
>>>>simply uses its SPI methods to retrieve this data instead?  Is this
>>>>really
>>>>just more of a convenience for the gadget developer than anything else?
>>>>
>>>>-Matt
>>>>
>>>>On 8/20/14, 4:14 PM, "Stanton Sievers" <ssiev...@apache.org> wrote:
>>>>
>>>>>Hi Matt,
>>>>>
>>>>>This behavior is configured in container.js in the "gadgets.features"
>>>>>object.  If you look for "osapi" and "osapi.services", you'll see some
>>>>>comments about this configuration and the behavior.
>>>>>features/container/service.js is where this configuration is used and
>>>>>where
>>>>>the osapi services are instantiated.  As you've seen, Shindig
>>>>>introspects
>>>>>to find available services by default.
>>>>>
>>>>>If I knew at one point why this behaves this way, I've since forgotten.
>>>>>There is a system.listMethods API[1] defined in the Core API Server
>>>>>spec
>>>>>that this might simply be re-using to discover the available services.
>>>>>
>>>>>I hope that helps.
>>>>>
>>>>>-Stanton
>>>>>
>>>>>[1]
>>>>>http://opensocial.github.io/spec/trunk/Core-API-Server.xml#System-Servi
>>>>>ce
>>>>>-
>>>>>ListMethods
>>>>>
>>>>>
>>>>>On Tue, Aug 19, 2014 at 8:13 AM, Merrill, Matt <mmerr...@mitre.org>
>>>>>wrote:
>>>>>
>>>>>> Good morning,
>>>>>>
>>>>>> I¹m hoping some shindig veterans can help shed some light into the
>>>>>>reason
>>>>>> that Shindig makes HTTP rpc calls to itself as part of the gadget
>>>>>>rendering
>>>>>> process.  Why is this done as opposed to retrieving information via
>>>>>> internal Java method calls?  We hare having lots of issues where this
>>>>>> approach seems to be causing a cascading failure when calls get hung
>>>>>>up
>>>>>>in
>>>>>> the HTTPFetcher class.
>>>>>>
>>>>>> Also, I¹m curious what calls are made in this manner and how can they
>>>>>>be
>>>>>> configured?  I have seen retrieval of viewer data done this way, as
>>>>>>well as
>>>>>> application data.
>>>>>>
>>>>>> I¹ve looked for documentation on this topic before and have not seen
>>>>>>any.
>>>>>>  Any help is much appreciated.
>>>>>>
>>>>>> Thanks!
>>>>>> -Matt Merrill
>>>>>>
>>>>
>>>
>

Reply via email to