It was iptables. We use tunneling between nodes in the cluster for security, and had some bugs in that.

Thanks,

Joseph Brower

On 03/01/2012 02:09 PM, dormando wrote:
packet mangling via iptables, or something else?

On Thu, 1 Mar 2012, Joseph Brower wrote:

It looked like we had some packet mangling going on.  Talk about a crazy bug
to track down.  I appreciate everyone's help!  It's all resolved now.

Thanks,

Joseph Brower

On 03/01/2012 12:34 PM, Joseph Brower wrote:
To rule out the extension causing issues, I actually used a class i found
that is slow, but doesn't rely on the pecl extensions at all (or the
memcached extension).  The issue still persisted.  I'll see if I can change
the version.  I'll also see if there is any packet mangling that might be
occuring.

Thanks,

Joseph Brower

On 03/01/2012 12:26 PM, dormando wrote:
Can you upgrade to .13 and try again?

You pasted some protocol errors... what version of pecl/memcache is that?
3.x might have trouble with the binary protocol as it was alpha
abandonware.

If they still happen with .13, it might be worth getting a log from
memcached. run it in screen with -vv and redirect the output to a logfile
or pipe through logger to syslog. it could be a lot of lines if things are
busy.

that should tell you if the server sees anything at all.

On Thu, 1 Mar 2012, Joseph Brower wrote:

Yup.  Thats how I've got the production environment set up.  We have two
memcache server, each with a decent amount of RAM.  The same thing
happens there (though,
not always to both memcache servers.  Sometimes it happens to one or the
other.)  Also, I've written a small memcache test script that just tries
to set and get a
very small value.  That works sometimes, and it fails other times.
That's what I find so odd.

Thanks,

Joseph Brower

On 03/01/2012 01:34 AM, Yiftach Shoolman wrote:
        Hi Joseph,
I guess you know that your Memcached size if only 10MB (STAT
limit_maxbytes 10485760). Magento zend cache (your objects) tests this
size prior to setting
object, and if the limit is reached (STAT bytes) you cannot set any new
object in the cache, but you can set new sessions - so that might be
your problem ,
though not according to the stats u sent.

One more thing, it is better to deploy Magento with 2 Memcached servers,
one for the cache and one for the session, so whenever you upgrade your
site and
flush the objects, you don't need to either flush your sessions  - see
typical configuration of one of our customers below.

Best,

Yiftach

config>
<global>
<session_cache_limiter></session_cache_limiter>
<session_save><![CDATA[memcache]]></session_save>
<session_save_path><![CDATA[DNSADDRESS]></session_save_path>
<cache>
<backend>memcached</backend><!-- apc / memcached / xcache / empty=file
-->
<slow_backend>database</slow_backend>  <!-- database / file (default) -
used for 2 levels cache setup, necessary for all shared memory storages
-->
<slow_backend_store_data>0</slow_backend_store_data>
<memcached><!-- memcached cache backend related config -->
<servers><!-- any number of server nodes can be included -->
<server>
<host><![CDATA[NSADDRESS]]></host>
<port><![CDATA[10245]]></port>
<persistent><![CDATA[1]]></persistent>
<weight><![CDATA[1]]></weight>
<timeout><![CDATA[10]]></timeout>
<retry_interval><![CDATA[10]]></retry_interval>
<status><![CDATA[1]]></status>
</server>
</servers>
<compression><![CDATA[0]]></compression>
<cache_dir><![CDATA[]]></cache_dir>
<hashed_directory_level><![CDATA[]]></hashed_directory_level>
<hashed_directory_umask><![CDATA[]]></hashed_directory_umask>
<file_name_prefix><![CDATA[]]></file_name_prefix>
</memcached>
</cache>




On Thu, Mar 1, 2012 at 9:59 AM, Joseph Brower<[email protected]>
wrote:
        Thanks for the response.

        I've been testing as best I can and I've found that setting and
getting fail.  I get either no output, or a

        Notice: Memcache::set(): Server my.memcachehost.com (tcp 11211)
failed with: Received malformed response (0) in /var/www/memcache.php on
line 5

        I'm able to continue setting and getting via telnet without any
issues.  Also, if I redeploy my webserver (onto somewhere else in our
cluster)
        things sometimes are happy, sometimes they continue to fail.
When I look at netstat, I don't see the connections in memcache.  When
looking at
        the output from memcached, it doesn't show any additional output
(as if the connection never reaches it.)  I'm confident it's not my
firewall
        rules, as I've got everything automated so that my configuration
is consistent between versions.  I've also ruled out the extension being
used.        It happens using the memcached, memcache, and an
extensionless method that I found.

        I'm running on Ubuntu 10.04.  All of the other services on this
cluster don't have any connection issues (mysql, http, load balancer,
ssl
        terminator) and they all use my same script for configuring the
firewall rules appropriately.

        All of the stats look ok.  I'm not maxing out the connection
limit, and I am nowhere near memory limits.  This happens when using
memcache for
        sessions and for page cache.
        STAT pid 126
        STAT uptime 2017
        STAT time 1330588738
        STAT version 1.4.10
        STAT libevent 1.4.13-stable
        STAT pointer_size 64
        STAT rusage_user 0.040000
        STAT rusage_system 0.160000
        STAT curr_connections 10
        STAT total_connections 37
        STAT connection_structures 11
        STAT reserved_fds 20
        STAT cmd_get 45
        STAT cmd_set 35
        STAT cmd_flush 0
        STAT cmd_touch 0
        STAT get_hits 40
        STAT get_misses 5
        STAT delete_misses 0
        STAT delete_hits 0
        STAT incr_misses 0
        STAT incr_hits 10
        STAT decr_misses 0
        STAT decr_hits 0
        STAT cas_misses 0
        STAT cas_hits 0
        STAT cas_badval 0
        STAT touch_hits 0
        STAT touch_misses 0
        STAT auth_cmds 0
        STAT auth_errors 0
        STAT bytes_read 1942
        STAT bytes_written 1672
        STAT limit_maxbytes 10485760
        STAT accepting_conns 1
        STAT listen_disabled_num 0
        STAT threads 4
        STAT conn_yields 0
        STAT hash_power_level 16
        STAT hash_bytes 524288
        STAT hash_is_expanding 0
        STAT expired_unfetched 0
        STAT evicted_unfetched 0
        STAT bytes 303
        STAT curr_items 4
        STAT total_items 27
        STAT evictions 0
        STAT reclaimed 0

        That's how some of my stats are.  I've tried various sizes, this
is an exceptionally small one that I was using only for testing.

        Thanks,

        Joseph Brower


        On 02/29/2012 11:16 PM, Yiftach Shoolman wrote:
              Hi Joseph,
Can you elaborate a bit more on your problem, what do you mean by
unavailable, can you set/get keys ? are your app-->mmemcached tcp
connections
disconnected ? have you reached to your memcached memory limit (please
send memcach stats) ? something else ?
Also, specific question about Magento, does it happen on the session
caching (I guess so) or the object caching the part that is based on
zend
caching ?

Yiftach

On Thu, Mar 1, 2012 at 3:41 AM, Joseph Brower<[email protected]>
wrote:
        When I'm using Memcache (the PECL extension) with Magento,
everything
        works well for an indeterminate amount of time.  After some time
        passes, Memcached becomes unavailable.  This is the odd part
though, I
        can still telnet into MemcacheD and issue commands.  I have 4
        webservers all connecting to one memcache instance.  Does anyone
have
        any ideas what might be going on?

        Thanks,

        Joseph Brower




--
Yiftach Shoolman
+972-54-7634621





--
Yiftach Shoolman
+972-54-7634621






Reply via email to