Matt can you tell us more about how you have configured the caches in shindig?  
When you are rendering these gadgets are you rendering the same gadget across 
all users?

-Ryan

> On Jul 9, 2014, at 3:31 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote:
> 
> Stanton, 
> 
> Thanks for responding!
> 
> This is one instance of shindig.
> 
> If you mean the configuration within the container and for the shindig
> java app, then yes, the locked domains are the same.  In fact, the
> configuration with the exception of shindig¹s host URL¹s is exactly the
> same from what I can tell.
> 
> Unfortunately, I don¹t have any way to trace that exact message, but I did
> do a traceroute from the server running shindig to the URL that is being
> called for rpc calls to make sure there weren¹t any extra network hops,
> and there weren¹t, it actually only had one, as expected for an app making
> an HTTP call to itself.
> 
> Thanks again for responding.
> 
> -Matt
> 
>> On 7/9/14, 3:08 PM, "Stanton Sievers" <ssiev...@apache.org> wrote:
>> 
>> Hi Matt,
>> 
>> Is the configuration for locked domains and security tokens consistent
>> between your test and production environments?
>> 
>> Do you have any way of tracing the request in the log entry you provided
>> through the network?  Is this a single Shindig server or is there any load
>> balancing occurring?
>> 
>> Regards,
>> -Stanton
>> 
>> 
>>> On Wed, Jul 9, 2014 at 2:40 PM, Merrill, Matt <mmerr...@mitre.org> wrote:
>>> 
>>> Hi shindig devs,
>>> 
>>> We are in the process of upgrading from shindig 2.0 to 2.5-update1 and
>>> everything has gone ok, however, once we got into our production
>>> environment, we are seeing significant slowdowns for the opensocial RPC
>>> calls that shindig makes to itself when rendering a gadget.
>>> 
>>> This is obviously very dependent on how we¹ve implemented the shindig
>>> interfaces in our own code, and also our infrastructure, however, so
>>> we¹re
>>> hoping someone on the list can help give us some more ideas for areas to
>>> investigate inside shindig itself or in general.
>>> 
>>> Here¹s what¹s happening:
>>> * Gadgets load fine when the app is not experiencing much load (< 10
>>> users
>>> rendering 10-12 gadgets on a page)
>>> * Once a reasonable subset of users begins rendering gadgets, gadget
>>> render calls through the ³ifr² endpoint start taking a very long time to
>>> respond
>>> * The problem gets worse from there
>>> * Even with extensive load testing we can¹t recreate this problem in our
>>> testing environments
>>> * Our system adminstrators have assured us that the configurations of
>>> our
>>> servers are the same between int and prod
>>> 
>>> This is an example of what we¹re seeing from the logs inside
>>> BasicHttpFetcher:
>>> 
>>> 
>>> http://238redacteddnsprefix234.gadgetsv2.company.com:7001/gmodules/rpc?st
>>> =mycontainer%3AvY2rb-teGXuk9HX8d6W0rm6wE6hkLxM95ByaSMQlV8RudwohiAFqAliywV
>>> wc5yQ8maFSwK7IEhogNVnoUXa-doA3_h7EbSDGq_DW5i_VvC0CFEeaTKtr70A9XgYlAq5T95j
>>> 7mivGO3lXVBTayU2PFNSdnLu8xtQEJJ7YrlmekEYyERmTSQmi7n2wZlmnG2puxVkegQKWNpdz
>>> OH4xCfgROnNCnAI
>>> is responding slowly. 12,449 ms elapsed.
>>> 
>>> We¹ll continue to get these warnings for rpc calls for many different
>>> gadgets, the amount of time elapsed will grow, and ultimately every
>>> gadget
>>> render slows to a crawl.
>>> 
>>> Some other relevant information:
>>> * We have implemented ³throttling² logic in our own custom HttpFetcher
>>> which extends the BasicHttpFetcher.  Basically, what this does, is keep
>>> track of how many outgoing requests are happening for a given url, and
>>> if
>>> there are too many concurrent ones going at once, it will start
>>> rejecting
>>> outgoing requests.  This was done to avoid a situation where an external
>>> service is responding slowly and ties up all of shindig¹s external http
>>> connections.  In our case, I believe that because our rpc endpoint is
>>> taking so long to respond, we start rejecting these requests with our
>>> throttling logic.
>>> 
>>> I have tried to trace through the rpc calls inside the shindig code
>>> starting in the RpcServlet, and as best I can tell, these rpc calls are
>>> used for:
>>> * getting viewer data
>>> * getting application data
>>> * anything else?
>>> 
>>> I¹ve also looked at the BasicHTTPFetcher, but nothing stands out at me
>>> at
>>> first glance that would cause such a difference in performance between
>>> environments if, as our sys admins say, they are the same.
>>> 
>>> Additionally, I¹ve ensured that the database table which contains our
>>> Application Data has been indexed properly (by person ID and gadget url)
>>> and that person data is cached.
>>> 
>>> Any other ideas, or areas in the codebase to explore are very much
>>> appreciated.
>>> 
>>> Thanks!
>>> -Matt
> 

Reply via email to