I don't think its SASL, if I am following the memcached logs correctly then its failing while doing a read from FD(it gets zero bytes which results in READ_ERROR). Look at the following back to back fail and success. Read fails well before the SASL(auth and etc) comes into picture, *but surprisingly it fails only when SASL is enabled.*
Interesting thing is, I tried the same thing on 5 different machines(RHEL6/5 VM), among which it works perfect on two machine(this is very confusing for me) and fails on 3 machines(VM). At tcp level I don't see any difference in the working and non working machines. Is there any tuning( at system level) which may needs to be done to make this Memcache-SASL combination to work? Just to go further, I was monitoring the FD's of the process memcached (under /proc/memcache_pid/fd), the thing I observed is whenever my app gets the MEMCACHED_TIMEOUT there are two new fds, and whenever my app gets "Success" there is only one new FD (36 in my case), like below. (Note:- I tried with db handle open option and in that case I see one FD points to db, but still in every failure I see two new FDs apart from the db handle fd, and in case of success only one new FD which is 36) *lrwx------ 1 root root 64 May 2 15:39 37 -> socket:[39863]lrwx------ 1 root root 64 May 2 15:39 36 -> socket:[39861]* *Memcached Logs*<36 new binary client connection. 36: going from conn_new_cmd to conn_waiting 36: going from conn_waiting to conn_read 36: going from conn_read to conn_parse_cmd <36 Read binary protocol data: <36 0x80 0x20 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x02 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 authenticated() in cmd 0x20 is true >36 Writing bin response: >36 0x81 0x20 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x05 >36 0x00 0x02 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x00 36: going from conn_parse_cmd to conn_mwrite 36: going from conn_mwrite to conn_new_cmd 36: going from conn_new_cmd to conn_waiting 36: going from conn_waiting to conn_read 36: going from conn_read to conn_parse_cmd <36 Read binary protocol data: <36 0x80 0x21 0x00 0x05 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x16 <36 0x00 0x02 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 authenticated() in cmd 0x21 is true 36: going from conn_parse_cmd to conn_nread mech: ``PLAIN'' with 17 bytes of data sasl result code: 0 >36 Writing bin response: >36 0x81 0x21 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x0d >36 0x00 0x02 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x00 36: going from conn_nread to conn_mwrite 36: going from conn_mwrite to conn_new_cmd 36: going from conn_new_cmd to conn_waiting 36: going from conn_waiting to conn_read 36: going from conn_read to conn_parse_cmd <36 Read binary protocol data: <36 0x80 0x01 0x00 0x4d <36 0x08 0x00 0x00 0x00 <36 0x00 0x00 0x00 0xa1 <36 0x00 0x01 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 authenticated() in cmd 0x01 is true 36: going from conn_parse_cmd to conn_nread <36 SET abcdefghij9999999999999999999999999999999999999999999999999999999999999999999 Value len is 76 > FOUND KEY abcdefghij9999999999999999999999999999999999999999999999999999999999999999999 >36 Writing bin response: >36 0x81 0x01 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x01 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x04 36: going from conn_nread to conn_mwrite 36: going from conn_mwrite to conn_new_cmd 36: going from conn_new_cmd to conn_waiting 36: going from conn_waiting to conn_read 36: going from conn_read to conn_parse_cmd <36 Read binary protocol data: <36 0x80 0x0c 0x00 0x4d <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x4d <36 0x00 0x03 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 authenticated() in cmd 0x0c is true 36: going from conn_parse_cmd to conn_nread <36 GET abcdefghij9999999999999999999999999999999999999999999999999999999999999999999 > FOUND KEY abcdefghij9999999999999999999999999999999999999999999999999999999999999999999 >36 Writing bin response: >36 0x81 0x0c 0x00 0x4d >36 0x04 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x9d >36 0x00 0x03 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x00 36: going from conn_nread to conn_mwrite 36: going from conn_mwrite to conn_new_cmd 36: going from conn_new_cmd to conn_waiting 36: going from conn_waiting to conn_read 36: going from conn_read to conn_parse_cmd <36 Read binary protocol data: <36 0x80 0x07 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x04 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 authenticated() in cmd 0x07 is true >36 Writing bin response: >36 0x81 0x07 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x04 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x00 36: going from conn_parse_cmd to conn_mwrite 36: going from conn_mwrite to conn_closing <36 connection closed. 36: going from conn_closing to conn_closed *<36 new binary client connection.* 36: going from conn_new_cmd to conn_waiting 36: going from conn_waiting to conn_read 36: going from conn_read to conn_parse_cmd <36 Read binary protocol data: <36 0x80 0x20 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x02 0x00 0x00 <36 0x00 0x00 0x00 0x00 <36 0x00 0x00 0x00 0x00 authenticated() in cmd 0x20 is true >36 Writing bin response: >36 0x81 0x20 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x05 >36 0x00 0x02 0x00 0x00 >36 0x00 0x00 0x00 0x00 >36 0x00 0x00 0x00 0x00 36: going from conn_parse_cmd to conn_mwrite 36: going from conn_mwrite to conn_new_cmd 36: going from conn_new_cmd to conn_waiting 36: going from conn_waiting to conn_read *36: going from conn_read to conn_closing* <36 connection closed. 36: going from conn_closing to conn_closed Thanks, Atul On Sunday, April 30, 2017 at 4:08:52 PM UTC-7, Dormando wrote: > > Sounds like the underlying SASL service is failing sometimes? I don't > think there's anything in the memcached code that would cause this. > > On Sat, 29 Apr 2017, watul123 wrote: > > > I am using same connection once the connection is established. What I > see so far is the inconsistency in the establishing the connection itself. > As > > pasted above in the logs, for same password/user name, same setup, same > test program, same memcached bin, if I run the test program multiple times > from > > command line, memcached_set either returns success or the > timeout(MEMCACHED_TIMEOUT). The logs from the server side shows...... > > > > going from conn_read to conn_closing > > <36 connection closed. > > 36: going from conn_closing to conn_closed > > > > If the connection establishment is successful then I can set/get "n" > number of values without any issue. > > > > > > On Wednesday, April 26, 2017 at 11:57:43 PM UTC-7, Dormando wrote: > > Are you reusing connections, or re-SASL'ing every time? I wonder > if the > > underlying sasl daemon/mechanism is failing. SASL stuff is only > ever > > checked when the connection is first established. All requests > after the > > connection setup is fine. > > > > On Wed, 26 Apr 2017, watul123 wrote: > > > > > I think the binary protocol just works as expected, there is > nothing to do with length as well, it was my test program where I commented > > out the below > > > line.... > > > "memcached_behavior_set(cache, > MEMCACHED_BEHAVIOR_BINARY_PROTOCOL, 1);" > > > > > > After fixing(un commenting) this line, no matter what length the > key/value are , it's success all the time. > > > > > > The actual issue is with SASL- connection gets closed all of > sudden like below, and the app gets Set Timeout, where as same thing gives > > success some time > > > Failed > > > <36 new binary client connection. > > > 36: going from conn_new_cmd to conn_waiting > > > 36: going from conn_waiting to conn_read > > > 36: going from conn_read to conn_parse_cmd > > > <36 Read binary protocol data: > > > <36 0x80 0x20 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x02 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > authenticated() in cmd 0x20 is true > > > >36 Writing bin response: > > > >36 0x81 0x20 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x05 > > > >36 0x00 0x02 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > 36: going from conn_parse_cmd to conn_mwrite > > > 36: going from conn_mwrite to conn_new_cmd > > > 36: going from conn_new_cmd to conn_waiting > > > 36: going from conn_waiting to conn_read > > > 36: going from conn_read to conn_closing > > > <36 connection closed. > > > > > > Success: > > > <36 new binary client connection. > > > 36: going from conn_new_cmd to conn_waiting > > > 36: going from conn_waiting to conn_read > > > 36: going from conn_read to conn_parse_cmd > > > <36 Read binary protocol data: > > > <36 0x80 0x20 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x02 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > authenticated() in cmd 0x20 is true > > > >36 Writing bin response: > > > >36 0x81 0x20 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x05 > > > >36 0x00 0x02 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > 36: going from conn_parse_cmd to conn_mwrite > > > 36: going from conn_mwrite to conn_new_cmd > > > 36: going from conn_new_cmd to conn_waiting > > > 36: going from conn_waiting to conn_read > > > 36: going from conn_read to conn_parse_cmd > > > <36 Read binary protocol data: > > > <36 0x80 0x21 0x00 0x05 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x16 > > > <36 0x00 0x02 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > authenticated() in cmd 0x21 is true > > > 36: going from conn_parse_cmd to conn_nread > > > mech: ``PLAIN'' with 17 bytes of data > > > INFO: User <real@awpcabm02> failed to authenticate > > > sasl result code: 0 > > > >36 Writing bin response: > > > >36 0x81 0x21 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x0d > > > >36 0x00 0x02 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > 36: going from conn_nread to conn_mwrite > > > 36: going from conn_mwrite to conn_new_cmd > > > 36: going from conn_new_cmd to conn_waiting > > > 36: going from conn_waiting to conn_read > > > 36: going from conn_read to conn_parse_cmd > > > <36 Read binary protocol data: > > > <36 0x80 0x01 0x00 0x17 > > > <36 0x08 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x39 > > > <36 0x00 0x01 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > authenticated() in cmd 0x01 is true > > > 36: going from conn_parse_cmd to conn_nread > > > <36 SET abcefghjklmnopqrstvwxyz Value len is 26 > > > > NOT FOUND abcefghjklmnopqrstvwxyz > > > >36 Writing bin response: > > > >36 0x81 0x01 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x01 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x01 > > > 36: going from conn_nread to conn_mwrite > > > 36: going from conn_mwrite to conn_new_cmd > > > 36: going from conn_new_cmd to conn_waiting > > > 36: going from conn_waiting to conn_read > > > 36: going from conn_read to conn_parse_cmd > > > <36 Read binary protocol data: > > > <36 0x80 0x0c 0x00 0x17 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x17 > > > <36 0x00 0x03 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > authenticated() in cmd 0x0c is true > > > 36: going from conn_parse_cmd to conn_nread > > > <36 GET abcefghjklmnopqrstvwxyz > > > > FOUND KEY abcefghjklmnopqrstvwxyz > > > >36 Writing bin response: > > > >36 0x81 0x0c 0x00 0x17 > > > >36 0x04 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x35 > > > >36 0x00 0x03 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > 36: going from conn_nread to conn_mwrite > > > 36: going from conn_mwrite to conn_new_cmd > > > 36: going from conn_new_cmd to conn_waiting > > > 36: going from conn_waiting to conn_read > > > 36: going from conn_read to conn_parse_cmd > > > <36 Read binary protocol data: > > > <36 0x80 0x07 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x04 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > <36 0x00 0x00 0x00 0x00 > > > authenticated() in cmd 0x07 is true > > > >36 Writing bin response: > > > >36 0x81 0x07 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x04 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > >36 0x00 0x00 0x00 0x00 > > > 36: going from conn_parse_cmd to conn_mwrite > > > 36: going from conn_mwrite to conn_closing > > > <36 connection closed. > > > 36: going from conn_closing to conn_closed > > > > > > > > > > > > > > > > > > > > > On Wednesday, April 26, 2017 at 11:08:45 PM UTC-7, Dormando > wrote: > > > How long are they when you get connection failure? > > > > > > On Wed, 26 Apr 2017, watul123 wrote: > > > > > > > No, its just like below > > > > > > > > const char* key = "abczadfg"; > > > > const char* value = "valuetestapp"; > > > > > > > > and > > > > > > > > I am using the this to ste the value "rc = > memcached_set(cache, key, strlen(key), value, strlen(value), (time_t)0, > flags);" > > > > > > > > Thanks, > > > > Atul > > > > > > > > On Wednesday, April 26, 2017 at 10:50:33 PM UTC-7, > Dormando wrote: > > > > Your key isn't over 256 bytes is it? > > > > > > > > On Wed, 26 Apr 2017, watul123 wrote: > > > > > > > > > Ok, thanks. Let me debug in that direction. I > will update the thread with my findings. > > > > > > > > > > BTW, if I increase the length of key and value > in my test program with binary protocol then I consistently get CONNECTION > > FAILURE > > > with > > > > below output on > > > > > memcached side > > > > > <36 new binary client connection. > > > > > 36: going from conn_new_cmd to conn_waiting > > > > > 36: going from conn_waiting to conn_read > > > > > 36: going from conn_read to conn_parse_cmd > > > > > <36 Read binary protocol data: > > > > > <36 0x73 0x65 0x74 0x20 > > > > > <36 0x61 0x62 0x63 0x7a > > > > > <36 0x20 0x30 0x20 0x30 > > > > > <36 0x20 0x36 0x0d 0x0a > > > > > <36 0x76 0x61 0x6c 0x75 > > > > > <36 0x65 0x65 0x0d 0x0a > > > > > Invalid magic: 73 > > > > > 36: going from conn_parse_cmd to conn_closing > > > > > <36 connection closed. > > > > > > > > > > Thanks again, > > > > > Atul > > > > > > > > > > On Wednesday, April 26, 2017 at 10:30:14 PM > UTC-7, Dormando wrote: > > > > > Ok. > > > > > > > > > > The binary protocol is tested pretty well > so it's *probably* something in > > > > > your program, but I won't rule out a bug > on the server either. The binary > > > > > protocol is in heavy usage in a bunch of > places though. > > > > > > > > > > On Wed, 26 Apr 2017, watul123 wrote: > > > > > > > > > > > I can test with large string quickly. In > the test app I may be sending fewer bytes, but in my real app I am sending > > way > > > more bytes. > > > > > > > > > > > > FYI: I started with SASL auth, which was > failing intermittently. After debugging I realized something is not right > > with > > > binary > > > > protocol, > > > > > then I disabled > > > > > > the SASL to take it our of picture > completely, and spawned the memcached with binary protocol and still I see > the > > the > > > intermittent > > > > > behavior. > > > > > > > > > > > > On Wednesday, April 26, 2017 at 9:54:00 > PM UTC-7, Dormando wrote: > > > > > > Looks like the protocol is getting > out of sync somehow. > > > > > > > > > > > > conn_waiting only means it's > waiting to read more bytes from the socket > > > > > > from a set. Then it looks like the > client doesn't send anymore bytes, > > > > > > times out, then closes the socket > (-> conn_read -> conn_closing). > > > > > > > > > > > > It's most likely a bug in how > you're using the binary protocol, but it's > > > > > > hard to say from here. Somehow > you're writing fewer bytes to the socket > > > > > > than you told the binary protocol > to receive. > > > > > > > > > > > > On Wed, 26 Apr 2017, watul123 > wrote: > > > > > > > > > > > > > Yes I am 100% sure. > > > > > > > When the binary protocol is in > picture then only this happens, otherwise same test program with same > > argument runs > > > perfect. > > > > I > > > > > debugged a > > > > > > lot before > > > > > > > posting to this group. I am with > you on the fact the binary protocol has nothing to do with the timeouts, > > but it is > > > the one > > > > cause > > > > > the > > > > > > failure while > > > > > > > reading from socket, then I > guess the connection gets close, and at the application level I get > > MEMCACHED_TIMEOUT. > > > > > > > > > > > > > > This is what I see on > memcached's log > > > > > > > > > > > > > > 36: going from conn_parse_cmd to > conn_mwrite > > > > > > > 36: going from conn_mwrite to > conn_new_cmd > > > > > > > 36: going from conn_new_cmd to > conn_waiting > > > > > > > 36: going from conn_waiting to > conn_read > > > > > > > 36: going from conn_read to > conn_closing > > > > > > > <36 connection closed. > > > > > > > 36: going from conn_closing to > conn_closed > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wednesday, April 26, 2017 at > 5:02:04 PM UTC-7, Dormando wrote: > > > > > > > Any way to get more > information about the timeouts you're seeing? > > > > > > > > > > > > > > There's nothing in the > protocol that would cause "timeouts", but bugs > > > > > > > somewhere could cause > clients to hang waiting on more data I guess. > > > > > > > > > > > > > > You're sure they're > timeouts and not some other kind of error? > > > > > > > > > > > > > > On Wed, 26 Apr 2017, Atul > Waghmare wrote: > > > > > > > > > > > > > > > Hi there, > > > > > > > > > > > > > > > > I am facing one issue > with memcached binary protocol. Whenever I force the memcached to use the > > binary > > > protocol, my > > > > > application get > > > > > > > occasional timeouts > > > > > > > > and occasional success. > The percentage of failure(set timeouts) is more than 80% when the memcached > > spawn > > > with > > > > binary > > > > > protocol . > > > > > > The moment > > > > > > > I remove the > > > > > > > > binary option, the > success rate is 100%. > > > > > > > > > > > > > > > > memcached - v1.4.36 > > > > > > > > libmemcached -v1.0.18 > > > > > > > > > > > > > > > > Any idea what may be > wrong? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Atul > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > --- > > > > > > > > You received this > message because you are subscribed to the Google Groups "memcached" group. > > > > > > > > To unsubscribe from this > group and stop receiving emails from it, send an email to > > > memcached+...@googlegroups.com. > > > > > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > --- > > > > > > > You received this message > because you are subscribed to the Google Groups "memcached" group. > > > > > > > To unsubscribe from this group > and stop receiving emails from it, send an email to > > memcached+...@googlegroups.com. > > > > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > --- > > > > > > You received this message because you > are subscribed to the Google Groups "memcached" group. > > > > > > To unsubscribe from this group and stop > receiving emails from it, send an email to memcached+...@googlegroups.com. > > > > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > --- > > > > > You received this message because you are > subscribed to the Google Groups "memcached" group. > > > > > To unsubscribe from this group and stop > receiving emails from it, send an email to memcached+...@googlegroups.com. > > > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are subscribed to > the Google Groups "memcached" group. > > > > To unsubscribe from this group and stop receiving emails > from it, send an email to memcached+...@googlegroups.com. > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from > it, send an email to memcached+...@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to memcached+...@googlegroups.com <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.