Kevin, I should have done some tests and detected Content-Length is not reaching the client. Good, Maksim detected it.
As far your comment (2), I believe if Knox is rewriting the content, it should rewrite the Content-Length ideally. But, it is not going to be practical. Needs some research on how to fix the problem right. Thanks Dilli On Fri, Oct 25, 2013 at 9:01 AM, Kevin Minder <[email protected]>wrote: > I was afraid that Knox might actually be removing the Content-Length > header. Dilli is going to yell at me about that BTW! > > So there are two things that need to be done. > > 1) Determine the client (e.g. curl) behavior when Content-Length is > specified. > > 2) Make changes in Knox so that the Content-Length response header is only > removed if the body is being rewritten. > > Please file a jira for #2. I've already given this some thought so I can > add detail. > > > On 10/25/13 11:55 AM, Maksim Kononenko wrote: > >> On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder >> <[email protected]>**wrote: >> >> Maksim, >>> Great work! >>> Discussion inline below. >>> Recommended next steps. >>> 1) Add the setup steps required to get all of this working to the user's >>> guide. File a jira. >>> 2) Figure out a way to automate these tests. Might be hard on Apache >>> infra. >>> Kevin. >>> >>> >>> On 10/25/13 8:55 AM, Maksim Kononenko wrote: >>> >>> Hi guys, >>>> >>>> I was researching/testing Knox HA with Apache HTTP Server + mod_proxy + >>>> mod_proxy_balancer. >>>> Here is what I found. >>>> I. 3 load balancer scheduler algorithms available for use: Request >>>> Counting, Weighted Traffic Counting and Pending Request Counting. ( >>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**> >>>> html#scheduler<http://httpd.**apache.org/docs/2.2/mod/mod_** >>>> proxy_balancer.html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler> >>>> > >>>> ) >>>> II. Load balancer stickyness. ( >>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**> >>>> html#stickyness<http://httpd.**apache.org/docs/2.2/mod/mod_** >>>> proxy_balancer.html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness> >>>> **> >>>> ) >>>> I configured and tested stickyness. Worked as it had to be. >>>> III. Failover. ( >>>> http://httpd.apache.org/docs/****2.2/mod/mod_proxy.html#****proxypass<http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass> >>>> <http://httpd.apache.**org/docs/2.2/mod/mod_proxy.**html#proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass> >>>> > >>>> ) >>>> 1. I ran foolowing use cases: >>>> a) Knox instance is down before client request comes in. >>>> Steps: >>>> - Configure Apache HTTP Server to proxy two Knox >>>> instances; >>>> - Shoot down Knox instance A; >>>> - Execute client request; >>>> - Verify that Knox instance A is marked as unavailable >>>> and >>>> client's request is redirected to Knox instance B; >>>> - Verify that all subsequent requests in scope of the >>>> same >>>> client's session are passed just to Knox instance B; >>>> - Verify that client's requests in scope of new >>>> session >>>> are >>>> tried to be passed to Knox instance A. >>>> It is required because Knox instance A could be >>>> started >>>> before new client's session. >>>> >>>> This seems a little sub-optimal to me but there may be nothing we can >>> do >>> about it. >>> The issue that I have is that I don't think Apache should be trying >>> instance-A first every time in this case. >>> So the question is how is Apache distributing load over instance-A and >>> instance-B? >>> Does it always try instance-A first or does it sometimes try instance-B >>> first? >>> In addition if it gets a failure for instance-A ideally it would take it >>> out of the "pool" for some (ideally configurable) period of time. >>> >> It depends on the load balancer scheduler algorithm. For my tests I used >> Request Counting. >> I'll look for any configuration related to take out of the "pool" time. >> >> This use case works fine. >>> >>>> b) Knox instance goes down when it processes client's PUT >>>> request. >>>> Steps: >>>> - Start executing PUT file to HDFS with medium size >>>> (200Mb); >>>> - After some time shoot down Knox instance which >>>> processes >>>> this request; >>>> - Verify that client gets 500 status code and no >>>> failover >>>> takes place. >>>> This use case works as it is described. Apache HTTP >>>> Server is >>>> not able to do failover in this case. >>>> c) Knox instance goes down when it processes client's GET >>>> request. >>>> Steps: >>>> - Start executing GET file from HDFS with medium size >>>> (200Mb); >>>> - After some time shoot down Knox instance which >>>> processes >>>> this request; >>>> - Verify that client gets 200 status code, >>>> 'Content-Length' >>>> header with value equals to file size and some bytes in the body. >>>> To execute this test I used as a client: >>>> 1) HttpClient - it doesn't produce any error when >>>> stream is closed. >>>> 2) CURL - it doesn't produce any error when >>>> stream is >>>> closed. >>>> 3) Firefox browser - it doesn't produce any error >>>> when >>>> stream is closed. >>>> All clients just download available bytes before >>>> stream >>>> is closed, so client has to manually compare 'Content-Length' header >>>> value >>>> and received bytes length. >>>> - No failover takes place. >>>> This use case works as it is described. Apache HTTP >>>> Server is >>>> not able to do failover in this case. >>>> >>>> This is unexpected and unfortunate. >>> I would have hoped that HttpClient and cURL at least would provide some >>> indication that the stream was incomplete according to the Content-Length >>> header. >>> The only thing I would recommend you trying is taking Knox out of the >>> picture, use cURL to GET the same file directly from HDFS, kill the >>> DataNode halfway through the stream and ensure that you see the same >>> behavior on the client side. >>> >> I just rechecked all headers/data and found that I was wrong about >> Content-Length header. Knox received this header from DN but it didn't >> send >> it to client. I misunderstood a little bit logs on the Knox side. >> I ran tests against DN usign CURL and it wrote "curl: (18) transfer closed >> with 107092406 bytes remaining to read" when I stopped DN. >> >> 2. Additional use cases. >>> >>>> What new cases could you advise? >>>> >>>> I just want to confirm that you have tested a scenario for HDFS where >>> the >>> call to the NameNode goes to instance-A and the subsequent call to the >>> DataNode goes to instance-B and this works. >>> >>> IV. What functionality did I miss? >>> Other than the note above I don't see anything missing. >>> >>> Maksim. >>>> >>>> >>>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to which it is addressed and may contain information that is >>> confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified >>> that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender >>> immediately >>> and delete it from your system. Thank You. >>> >>> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
