Yes, we have.  During a couple of the outages we did a thread dump and saw
that all (or almost all) of the threads were blocking on the
BasicHTTPFetcher fetch method. We also saw the number of threads jump up
to around the same number of threads we have in our Tomcat HTTP thread
pool (300).

From the best I can tell, it seems as though the issue is that there are
now MORE calls to the various shindig servlets being made which is causing
all of the HTTP threads to get consumed, but we can’t explain why as the
load is the same. Once we roll back to the version of the application
which uses shindig 2.0.0, everything is absolutely fine.

I’m very hesitant to just increase the thread pool without a good
understanding of what could cause this.  If someone knows something that
changed between the 2.0.0 and 2.5.0-update1 versions that may have caused
more calls to be made whether through the opensocial java API or
internally inside shindig that would be great to know.

Or, perhaps a configuration parameter was introduced that we have set
wrong that may have caused all these extra calls?

We have already made sure our HTTP responses are cached at a very high
level per your excellent advice. However, because the majority of the
calls which seem to be taking a long time are RPC calls, it doesn’t appear
these get cached anyway so that wouldn’t affect this problem.

And if someone knows the answers to the configuration/extension questions
about pipelining, that would be great.

Thanks!

-Matt

On 9/5/14, 11:35 AM, "Ryan Baxter" <rbaxte...@apache.org> wrote:

>So Matt have you looking into what those threads are doing?  I agree
>that it seems odd that with 2.5.1-update1 you are running out of
>threads but it is hard to pinpoint the reason without knowing what all
>those extra threads might be doing.
>
>
>On Thu, Sep 4, 2014 at 11:04 AM, Merrill, Matt <mmerr...@mitre.org> wrote:
>> Hi all,
>>
>> I haven’t heard back on this, so I thought I’d provide some more
>> information in the hopes that perhaps someone has some ideas as to what
>> could be causing the issues we’re seeing with shindig’s “loopback” http
>> calls.
>>
>> We have a situation where under load we hit a deadlock-like situation
>> because of the HTTP calls shindig makes to itself when pipelining gadget
>> data. Basically, the HTTP request threadpools inside our Shindig Tomcat
>> container are getting maxed out, and when shindig makes an http rpc call
>> to itself to render a gadget which pipelines data, the request gets held
>> up waiting for the rpc call, which might be being blocked by the Tomcat
>> container waiting to handle an HTTP request.  This only happens under
>> load, of course.
>>
>> This is puzzling to me because when we were running Shindig 2.0.0, we
>>had
>> the same size threadpool, and now that we’ve upgraded to Shindig
>> 2.5.0-update1, the threadpools now seem to be getting maxed out.  I took
>> some timings inside of our various shindig SPI implementions
>> (PersonService, AppData Service) and I didn’t see anything alarming.
>> There are also no spikes in user traffic.
>>
>> As I see it, we have a few options I could explore:
>>
>> 1) The “nuclear” option would be to simply increase our tomcat HTTP
>> threadpools, but that doesn’t seem prudent since the old version of
>> shindig worked just fine with that thread pool setting.  I feel like a
>> greater problem is being masked. Is there anything that changed between
>> Shindig 2.0.0 and 2.5.0-update1 that could have caused some kind of
>> increase in traffic to shindig?  I tried looking at release notes in
>>Jira,
>> but that honestly wasn’t very helpful at all.
>>
>> 2) Re-configure Shindig to use implemented SPI methods (java method
>>calls)
>> instead of making HTTP calls to itself through the RPC API shindig
>> exposes?  Based on Stanton’s note below, it seems like there are some
>> configuration options for the RPC calls, but they’re mostly related to
>>how
>> the client-side javascript makes the calls.  Is there anything server
>>side
>> I can configure?  Perhaps with Guice modules?
>>
>> 3) Explore would be if there are hooks in the code to write custom code
>>to
>> do this. I see in PipelinedDataPreloader.executeSocialRequest that the
>> javadoc mentions that:
>> "Subclasses can override to provide special handling (e.g., directly
>> invoking a local API)”  However, I’m missing something because I can’t
>> find out where the preloader gets instantiated.  I see that the
>> PipelineExecutor takes in a Guice injected instance of
>> PipedlinedDataPreloader, however, I don’t see it getting created
>>anywhere.
>>  Where is this being configured?
>
>The intention was probably to make this possible via Guice, but there
>is not interface you can bind an implementation to.  You would have to
>replace the classes where PipelinesDataPreloader are used and then
>keep going up the chain until you get to a class where you can inject
>something via Guice.  Looks like a messy situation right now with the
>current way the code is written.
>
>>
>> Any help is appreciated!
>>
>> Thanks!
>> -Matt
>>
>> On 8/25/14, 4:55 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote:
>>
>>>Thanks Stanton!
>>>
>>>I¹m assuming that you mean the javascript calls will call listmethods
>>>then
>>>make any necessary RPC calls, is that correct?  Is there any other
>>>documentation on the introspection part?
>>>
>>>The reason I ask is that we¹re having problems server side when Shindig
>>>is
>>>pipelining data.  For example, when you do the following in a gadget:
>>><os:ViewerRequest key="viewer" />
>>>    <os:DataRequest key="appData" method="appdata.get" userId="@viewer"
>>>appId="@app"/>
>>>
>>>
>>>Shindig appears to make HTTP requests to the rpc endpoint to itself in
>>>the
>>>process of rendering the gadget.  I could be missing something
>>>fundamental
>>>here, but is there any way to configure this differently so that shindig
>>>simply uses its SPI methods to retrieve this data instead?  Is this
>>>really
>>>just more of a convenience for the gadget developer than anything else?
>>>
>>>-Matt
>>>
>>>On 8/20/14, 4:14 PM, "Stanton Sievers" <ssiev...@apache.org> wrote:
>>>
>>>>Hi Matt,
>>>>
>>>>This behavior is configured in container.js in the "gadgets.features"
>>>>object.  If you look for "osapi" and "osapi.services", you'll see some
>>>>comments about this configuration and the behavior.
>>>>features/container/service.js is where this configuration is used and
>>>>where
>>>>the osapi services are instantiated.  As you've seen, Shindig
>>>>introspects
>>>>to find available services by default.
>>>>
>>>>If I knew at one point why this behaves this way, I've since forgotten.
>>>>There is a system.listMethods API[1] defined in the Core API Server
>>>>spec
>>>>that this might simply be re-using to discover the available services.
>>>>
>>>>I hope that helps.
>>>>
>>>>-Stanton
>>>>
>>>>[1]
>>>>http://opensocial.github.io/spec/trunk/Core-API-Server.xml#System-Servi
>>>>ce
>>>>-
>>>>ListMethods
>>>>
>>>>
>>>>On Tue, Aug 19, 2014 at 8:13 AM, Merrill, Matt <mmerr...@mitre.org>
>>>>wrote:
>>>>
>>>>> Good morning,
>>>>>
>>>>> I¹m hoping some shindig veterans can help shed some light into the
>>>>>reason
>>>>> that Shindig makes HTTP rpc calls to itself as part of the gadget
>>>>>rendering
>>>>> process.  Why is this done as opposed to retrieving information via
>>>>> internal Java method calls?  We hare having lots of issues where this
>>>>> approach seems to be causing a cascading failure when calls get hung
>>>>>up
>>>>>in
>>>>> the HTTPFetcher class.
>>>>>
>>>>> Also, I¹m curious what calls are made in this manner and how can they
>>>>>be
>>>>> configured?  I have seen retrieval of viewer data done this way, as
>>>>>well as
>>>>> application data.
>>>>>
>>>>> I¹ve looked for documentation on this topic before and have not seen
>>>>>any.
>>>>>  Any help is much appreciated.
>>>>>
>>>>> Thanks!
>>>>> -Matt Merrill
>>>>>
>>>
>>

Reply via email to