> have two servers in AWS. One is a live production server (a multi site
> WordPress installation with hundreds of sites and about 5,000 users) and the
> other is a clone of prod that is being configured for a test server. The
> live one has four array servers, an Elastic
> Load Balancer and is connected to a large RDS in AWS. And until yesterday, I
> naively thought our caching was being handled via APC and a WordPress plugin
> here and there. But no. Turns out someone here had added AWS's ElastiCache
> to our live server. Essentially,
> ElastiCache is memcache for those not in the cloud.
>
> Anyway, we tried to enable caching on our test server two days ago and it
> introduced a really strange bug (a redirect mysteriously appeared on our live
> site's main admin dashboard that then went to our test server). So once we
> realized the bug was most likely
> related to a caching system we didn't know we had, we disabled caching. As
> it turned out, when we enabled caching on our test server, it used the same
> Elasticache server our live server was using (because test was a clone of
> live). So we disabled it when we
> removed/renamed the object-cache.php file.
>
> Disabling it solved our redirect issue, but suddenly, many (not all) of our
> 5,000 users could no longer log into their individual sites. For some
> reason, the values that were in our database were not working for a good
> percentage of users, forcing them to have to
> reset their passwords instead. Obviously, this is huge with 5,000 users in
> the mix. So we reenabled caching on our live instance and decided to fix our
> cached redirect with WP configuration changes instead (we
> added define('RELOCATE',true); into the config to force
> the redirection to our test server to be overridden).
>
> One of the things we noticed with memcache was that it kept updating our
> wp_options table with the domain for the test server in place of our live
> one. In fact, it's still doing it whenever I run a query to find the string
> for the test domain and update it to the
> live domain. Every few minutes, the caching changes it back. Scary. But it
> looks like our configuration change for now forces an override. The really
> concerning thing about all this was the fact that it seems memcache is
> drawing from its own key:value pairs for the
> user passwords instead of directly from the database. I mean with caching
> enabled, the users can get in. Without it, many users are forced to reset
> their passwords.
>
> Does anyone have any ideas for me as to how to effectively understand what's
> going on with memcache in this case and how to fix it so the database gets
> written to appropriately and so password info isn't just being held in the
> cache? To my thinking it's a ticking
> time bomb. All it would take is one flush_all command to make life very,
> very painful for most of my users.
>
> We are on Nginx with MySQL on the RDS.
This sounds like an issue with wordpress' usage of memcached, not
something about using memcached itself. You'll probably get a lot further
asking WordPress people who are familiar with how it uses memcached?
Memcached itself doesn't talk to any databases nor does it have any logic,
it's just a binary key/value blob store with an API.