Re: Performance problems with opensocial rpc calls

Ryan Baxter Sun, 13 Jul 2014 07:13:14 -0700

Matt can you tell us more about how you have configured the caches in shindig?  
When you are rendering these gadgets are you rendering the same gadget across 
all users?


-Ryan

> On Jul 9, 2014, at 3:31 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote:
> 
> Stanton, 
> 
> Thanks for responding!
> 
> This is one instance of shindig.
> 
> If you mean the configuration within the container and for the shindig
> java app, then yes, the locked domains are the same.  In fact, the
> configuration with the exception of shindig¹s host URL¹s is exactly the
> same from what I can tell.
> 
> Unfortunately, I don¹t have any way to trace that exact message, but I did
> do a traceroute from the server running shindig to the URL that is being
> called for rpc calls to make sure there weren¹t any extra network hops,
> and there weren¹t, it actually only had one, as expected for an app making
> an HTTP call to itself.
> 
> Thanks again for responding.
> 
> -Matt
> 
>> On 7/9/14, 3:08 PM, "Stanton Sievers" <ssiev...@apache.org> wrote:
>> 
>> Hi Matt,
>> 
>> Is the configuration for locked domains and security tokens consistent
>> between your test and production environments?
>> 
>> Do you have any way of tracing the request in the log entry you provided
>> through the network?  Is this a single Shindig server or is there any load
>> balancing occurring?
>> 
>> Regards,
>> -Stanton
>> 
>> 
>>> On Wed, Jul 9, 2014 at 2:40 PM, Merrill, Matt <mmerr...@mitre.org> wrote:
>>> 
>>> Hi shindig devs,
>>> 
>>> We are in the process of upgrading from shindig 2.0 to 2.5-update1 and
>>> everything has gone ok, however, once we got into our production
>>> environment, we are seeing significant slowdowns for the opensocial RPC
>>> calls that shindig makes to itself when rendering a gadget.
>>> 
>>> This is obviously very dependent on how we¹ve implemented the shindig
>>> interfaces in our own code, and also our infrastructure, however, so
>>> we¹re
>>> hoping someone on the list can help give us some more ideas for areas to
>>> investigate inside shindig itself or in general.
>>> 
>>> Here¹s what¹s happening:
>>> * Gadgets load fine when the app is not experiencing much load (< 10
>>> users
>>> rendering 10-12 gadgets on a page)
>>> * Once a reasonable subset of users begins rendering gadgets, gadget
>>> render calls through the ³ifr² endpoint start taking a very long time to
>>> respond
>>> * The problem gets worse from there
>>> * Even with extensive load testing we can¹t recreate this problem in our
>>> testing environments
>>> * Our system adminstrators have assured us that the configurations of
>>> our
>>> servers are the same between int and prod
>>> 
>>> This is an example of what we¹re seeing from the logs inside
>>> BasicHttpFetcher:
>>> 
>>> 
>>> http://238redacteddnsprefix234.gadgetsv2.company.com:7001/gmodules/rpc?st
>>> =mycontainer%3AvY2rb-teGXuk9HX8d6W0rm6wE6hkLxM95ByaSMQlV8RudwohiAFqAliywV
>>> wc5yQ8maFSwK7IEhogNVnoUXa-doA3_h7EbSDGq_DW5i_VvC0CFEeaTKtr70A9XgYlAq5T95j
>>> 7mivGO3lXVBTayU2PFNSdnLu8xtQEJJ7YrlmekEYyERmTSQmi7n2wZlmnG2puxVkegQKWNpdz
>>> OH4xCfgROnNCnAI
>>> is responding slowly. 12,449 ms elapsed.
>>> 
>>> We¹ll continue to get these warnings for rpc calls for many different
>>> gadgets, the amount of time elapsed will grow, and ultimately every
>>> gadget
>>> render slows to a crawl.
>>> 
>>> Some other relevant information:
>>> * We have implemented ³throttling² logic in our own custom HttpFetcher
>>> which extends the BasicHttpFetcher.  Basically, what this does, is keep
>>> track of how many outgoing requests are happening for a given url, and
>>> if
>>> there are too many concurrent ones going at once, it will start
>>> rejecting
>>> outgoing requests.  This was done to avoid a situation where an external
>>> service is responding slowly and ties up all of shindig¹s external http
>>> connections.  In our case, I believe that because our rpc endpoint is
>>> taking so long to respond, we start rejecting these requests with our
>>> throttling logic.
>>> 
>>> I have tried to trace through the rpc calls inside the shindig code
>>> starting in the RpcServlet, and as best I can tell, these rpc calls are
>>> used for:
>>> * getting viewer data
>>> * getting application data
>>> * anything else?
>>> 
>>> I¹ve also looked at the BasicHTTPFetcher, but nothing stands out at me
>>> at
>>> first glance that would cause such a difference in performance between
>>> environments if, as our sys admins say, they are the same.
>>> 
>>> Additionally, I¹ve ensured that the database table which contains our
>>> Application Data has been indexed properly (by person ID and gadget url)
>>> and that person data is cached.
>>> 
>>> Any other ideas, or areas in the codebase to explore are very much
>>> appreciated.
>>> 
>>> Thanks!
>>> -Matt
>

Re: Performance problems with opensocial rpc calls

Reply via email to