Yes, I added logging for every call out the door and the majority of the
calls which are holding up incoming threads are making http calls back to
shindig itself, most notably the /rpc servlet endpoint.

Basically, because shindig makes a loopback http call to itself and that’s
on the same HTTP threadpool, the threadpool is starting to get exhausted
and there’s a cascading failure.  However, that threadpool is the same
size as we had it on Shindig 2.0.0, so I really can’t explain the
difference.

I’m really wondering of any differences between 2.0.0 and 2.5.0-update1
that might cause additional HTTP calls to be made and whether you can
configure or code shindig not to make these loopbacks.

-Matt

On 9/5/14, 12:42 PM, "Ryan Baxter" <rbaxte...@apache.org> wrote:

>So have you looked at what resources the fetcher is fetching?
>
>On Fri, Sep 5, 2014 at 12:17 PM, Merrill, Matt <mmerr...@mitre.org> wrote:
>> Yes, we have.  During a couple of the outages we did a thread dump and
>>saw
>> that all (or almost all) of the threads were blocking on the
>> BasicHTTPFetcher fetch method. We also saw the number of threads jump up
>> to around the same number of threads we have in our Tomcat HTTP thread
>> pool (300).
>>
>> From the best I can tell, it seems as though the issue is that there are
>> now MORE calls to the various shindig servlets being made which is
>>causing
>> all of the HTTP threads to get consumed, but we can’t explain why as the
>> load is the same. Once we roll back to the version of the application
>> which uses shindig 2.0.0, everything is absolutely fine.
>>
>> I’m very hesitant to just increase the thread pool without a good
>> understanding of what could cause this.  If someone knows something that
>> changed between the 2.0.0 and 2.5.0-update1 versions that may have
>>caused
>> more calls to be made whether through the opensocial java API or
>> internally inside shindig that would be great to know.
>>
>> Or, perhaps a configuration parameter was introduced that we have set
>> wrong that may have caused all these extra calls?
>>
>> We have already made sure our HTTP responses are cached at a very high
>> level per your excellent advice. However, because the majority of the
>> calls which seem to be taking a long time are RPC calls, it doesn’t
>>appear
>> these get cached anyway so that wouldn’t affect this problem.
>>
>> And if someone knows the answers to the configuration/extension
>>questions
>> about pipelining, that would be great.
>>
>> Thanks!
>>
>> -Matt
>>
>> On 9/5/14, 11:35 AM, "Ryan Baxter" <rbaxte...@apache.org> wrote:
>>
>>>So Matt have you looking into what those threads are doing?  I agree
>>>that it seems odd that with 2.5.1-update1 you are running out of
>>>threads but it is hard to pinpoint the reason without knowing what all
>>>those extra threads might be doing.
>>>
>>>
>>>On Thu, Sep 4, 2014 at 11:04 AM, Merrill, Matt <mmerr...@mitre.org>
>>>wrote:
>>>> Hi all,
>>>>
>>>> I haven’t heard back on this, so I thought I’d provide some more
>>>> information in the hopes that perhaps someone has some ideas as to
>>>>what
>>>> could be causing the issues we’re seeing with shindig’s “loopback”
>>>>http
>>>> calls.
>>>>
>>>> We have a situation where under load we hit a deadlock-like situation
>>>> because of the HTTP calls shindig makes to itself when pipelining
>>>>gadget
>>>> data. Basically, the HTTP request threadpools inside our Shindig
>>>>Tomcat
>>>> container are getting maxed out, and when shindig makes an http rpc
>>>>call
>>>> to itself to render a gadget which pipelines data, the request gets
>>>>held
>>>> up waiting for the rpc call, which might be being blocked by the
>>>>Tomcat
>>>> container waiting to handle an HTTP request.  This only happens under
>>>> load, of course.
>>>>
>>>> This is puzzling to me because when we were running Shindig 2.0.0, we
>>>>had
>>>> the same size threadpool, and now that we’ve upgraded to Shindig
>>>> 2.5.0-update1, the threadpools now seem to be getting maxed out.  I
>>>>took
>>>> some timings inside of our various shindig SPI implementions
>>>> (PersonService, AppData Service) and I didn’t see anything alarming.
>>>> There are also no spikes in user traffic.
>>>>
>>>> As I see it, we have a few options I could explore:
>>>>
>>>> 1) The “nuclear” option would be to simply increase our tomcat HTTP
>>>> threadpools, but that doesn’t seem prudent since the old version of
>>>> shindig worked just fine with that thread pool setting.  I feel like a
>>>> greater problem is being masked. Is there anything that changed
>>>>between
>>>> Shindig 2.0.0 and 2.5.0-update1 that could have caused some kind of
>>>> increase in traffic to shindig?  I tried looking at release notes in
>>>>Jira,
>>>> but that honestly wasn’t very helpful at all.
>>>>
>>>> 2) Re-configure Shindig to use implemented SPI methods (java method
>>>>calls)
>>>> instead of making HTTP calls to itself through the RPC API shindig
>>>> exposes?  Based on Stanton’s note below, it seems like there are some
>>>> configuration options for the RPC calls, but they’re mostly related to
>>>>how
>>>> the client-side javascript makes the calls.  Is there anything server
>>>>side
>>>> I can configure?  Perhaps with Guice modules?
>>>>
>>>> 3) Explore would be if there are hooks in the code to write custom
>>>>code
>>>>to
>>>> do this. I see in PipelinedDataPreloader.executeSocialRequest that the
>>>> javadoc mentions that:
>>>> "Subclasses can override to provide special handling (e.g., directly
>>>> invoking a local API)”  However, I’m missing something because I can’t
>>>> find out where the preloader gets instantiated.  I see that the
>>>> PipelineExecutor takes in a Guice injected instance of
>>>> PipedlinedDataPreloader, however, I don’t see it getting created
>>>>anywhere.
>>>>  Where is this being configured?
>>>
>>>The intention was probably to make this possible via Guice, but there
>>>is not interface you can bind an implementation to.  You would have to
>>>replace the classes where PipelinesDataPreloader are used and then
>>>keep going up the chain until you get to a class where you can inject
>>>something via Guice.  Looks like a messy situation right now with the
>>>current way the code is written.
>>>
>>>>
>>>> Any help is appreciated!
>>>>
>>>> Thanks!
>>>> -Matt
>>>>
>>>> On 8/25/14, 4:55 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote:
>>>>
>>>>>Thanks Stanton!
>>>>>
>>>>>I¹m assuming that you mean the javascript calls will call listmethods
>>>>>then
>>>>>make any necessary RPC calls, is that correct?  Is there any other
>>>>>documentation on the introspection part?
>>>>>
>>>>>The reason I ask is that we¹re having problems server side when
>>>>>Shindig
>>>>>is
>>>>>pipelining data.  For example, when you do the following in a gadget:
>>>>><os:ViewerRequest key="viewer" />
>>>>>    <os:DataRequest key="appData" method="appdata.get"
>>>>>userId="@viewer"
>>>>>appId="@app"/>
>>>>>
>>>>>
>>>>>Shindig appears to make HTTP requests to the rpc endpoint to itself in
>>>>>the
>>>>>process of rendering the gadget.  I could be missing something
>>>>>fundamental
>>>>>here, but is there any way to configure this differently so that
>>>>>shindig
>>>>>simply uses its SPI methods to retrieve this data instead?  Is this
>>>>>really
>>>>>just more of a convenience for the gadget developer than anything
>>>>>else?
>>>>>
>>>>>-Matt
>>>>>
>>>>>On 8/20/14, 4:14 PM, "Stanton Sievers" <ssiev...@apache.org> wrote:
>>>>>
>>>>>>Hi Matt,
>>>>>>
>>>>>>This behavior is configured in container.js in the "gadgets.features"
>>>>>>object.  If you look for "osapi" and "osapi.services", you'll see
>>>>>>some
>>>>>>comments about this configuration and the behavior.
>>>>>>features/container/service.js is where this configuration is used and
>>>>>>where
>>>>>>the osapi services are instantiated.  As you've seen, Shindig
>>>>>>introspects
>>>>>>to find available services by default.
>>>>>>
>>>>>>If I knew at one point why this behaves this way, I've since
>>>>>>forgotten.
>>>>>>There is a system.listMethods API[1] defined in the Core API Server
>>>>>>spec
>>>>>>that this might simply be re-using to discover the available
>>>>>>services.
>>>>>>
>>>>>>I hope that helps.
>>>>>>
>>>>>>-Stanton
>>>>>>
>>>>>>[1]
>>>>>>http://opensocial.github.io/spec/trunk/Core-API-Server.xml#System-Ser
>>>>>>vi
>>>>>>ce
>>>>>>-
>>>>>>ListMethods
>>>>>>
>>>>>>
>>>>>>On Tue, Aug 19, 2014 at 8:13 AM, Merrill, Matt <mmerr...@mitre.org>
>>>>>>wrote:
>>>>>>
>>>>>>> Good morning,
>>>>>>>
>>>>>>> I¹m hoping some shindig veterans can help shed some light into the
>>>>>>>reason
>>>>>>> that Shindig makes HTTP rpc calls to itself as part of the gadget
>>>>>>>rendering
>>>>>>> process.  Why is this done as opposed to retrieving information via
>>>>>>> internal Java method calls?  We hare having lots of issues where
>>>>>>>this
>>>>>>> approach seems to be causing a cascading failure when calls get
>>>>>>>hung
>>>>>>>up
>>>>>>>in
>>>>>>> the HTTPFetcher class.
>>>>>>>
>>>>>>> Also, I¹m curious what calls are made in this manner and how can
>>>>>>>they
>>>>>>>be
>>>>>>> configured?  I have seen retrieval of viewer data done this way, as
>>>>>>>well as
>>>>>>> application data.
>>>>>>>
>>>>>>> I¹ve looked for documentation on this topic before and have not
>>>>>>>seen
>>>>>>>any.
>>>>>>>  Any help is much appreciated.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> -Matt Merrill
>>>>>>>
>>>>>
>>>>
>>

Reply via email to