Matt can you tell us more about how you have configured the caches in shindig? When you are rendering these gadgets are you rendering the same gadget across all users?
-Ryan > On Jul 9, 2014, at 3:31 PM, "Merrill, Matt" <mmerr...@mitre.org> wrote: > > Stanton, > > Thanks for responding! > > This is one instance of shindig. > > If you mean the configuration within the container and for the shindig > java app, then yes, the locked domains are the same. In fact, the > configuration with the exception of shindig¹s host URL¹s is exactly the > same from what I can tell. > > Unfortunately, I don¹t have any way to trace that exact message, but I did > do a traceroute from the server running shindig to the URL that is being > called for rpc calls to make sure there weren¹t any extra network hops, > and there weren¹t, it actually only had one, as expected for an app making > an HTTP call to itself. > > Thanks again for responding. > > -Matt > >> On 7/9/14, 3:08 PM, "Stanton Sievers" <ssiev...@apache.org> wrote: >> >> Hi Matt, >> >> Is the configuration for locked domains and security tokens consistent >> between your test and production environments? >> >> Do you have any way of tracing the request in the log entry you provided >> through the network? Is this a single Shindig server or is there any load >> balancing occurring? >> >> Regards, >> -Stanton >> >> >>> On Wed, Jul 9, 2014 at 2:40 PM, Merrill, Matt <mmerr...@mitre.org> wrote: >>> >>> Hi shindig devs, >>> >>> We are in the process of upgrading from shindig 2.0 to 2.5-update1 and >>> everything has gone ok, however, once we got into our production >>> environment, we are seeing significant slowdowns for the opensocial RPC >>> calls that shindig makes to itself when rendering a gadget. >>> >>> This is obviously very dependent on how we¹ve implemented the shindig >>> interfaces in our own code, and also our infrastructure, however, so >>> we¹re >>> hoping someone on the list can help give us some more ideas for areas to >>> investigate inside shindig itself or in general. >>> >>> Here¹s what¹s happening: >>> * Gadgets load fine when the app is not experiencing much load (< 10 >>> users >>> rendering 10-12 gadgets on a page) >>> * Once a reasonable subset of users begins rendering gadgets, gadget >>> render calls through the ³ifr² endpoint start taking a very long time to >>> respond >>> * The problem gets worse from there >>> * Even with extensive load testing we can¹t recreate this problem in our >>> testing environments >>> * Our system adminstrators have assured us that the configurations of >>> our >>> servers are the same between int and prod >>> >>> This is an example of what we¹re seeing from the logs inside >>> BasicHttpFetcher: >>> >>> >>> http://238redacteddnsprefix234.gadgetsv2.company.com:7001/gmodules/rpc?st >>> =mycontainer%3AvY2rb-teGXuk9HX8d6W0rm6wE6hkLxM95ByaSMQlV8RudwohiAFqAliywV >>> wc5yQ8maFSwK7IEhogNVnoUXa-doA3_h7EbSDGq_DW5i_VvC0CFEeaTKtr70A9XgYlAq5T95j >>> 7mivGO3lXVBTayU2PFNSdnLu8xtQEJJ7YrlmekEYyERmTSQmi7n2wZlmnG2puxVkegQKWNpdz >>> OH4xCfgROnNCnAI >>> is responding slowly. 12,449 ms elapsed. >>> >>> We¹ll continue to get these warnings for rpc calls for many different >>> gadgets, the amount of time elapsed will grow, and ultimately every >>> gadget >>> render slows to a crawl. >>> >>> Some other relevant information: >>> * We have implemented ³throttling² logic in our own custom HttpFetcher >>> which extends the BasicHttpFetcher. Basically, what this does, is keep >>> track of how many outgoing requests are happening for a given url, and >>> if >>> there are too many concurrent ones going at once, it will start >>> rejecting >>> outgoing requests. This was done to avoid a situation where an external >>> service is responding slowly and ties up all of shindig¹s external http >>> connections. In our case, I believe that because our rpc endpoint is >>> taking so long to respond, we start rejecting these requests with our >>> throttling logic. >>> >>> I have tried to trace through the rpc calls inside the shindig code >>> starting in the RpcServlet, and as best I can tell, these rpc calls are >>> used for: >>> * getting viewer data >>> * getting application data >>> * anything else? >>> >>> I¹ve also looked at the BasicHTTPFetcher, but nothing stands out at me >>> at >>> first glance that would cause such a difference in performance between >>> environments if, as our sys admins say, they are the same. >>> >>> Additionally, I¹ve ensured that the database table which contains our >>> Application Data has been indexed properly (by person ID and gadget url) >>> and that person data is cached. >>> >>> Any other ideas, or areas in the codebase to explore are very much >>> appreciated. >>> >>> Thanks! >>> -Matt >