I believe that since the Content-Length is a header that is written
before the body is rewritten that the best we can do is avoid removing
the Content-Length header when we know that we will not be rewriting the
body.
On 10/25/13 12:09 PM, Dilli Arumugam wrote:
Kevin,
I should have done some tests and detected Content-Length is not reaching
the client.
Good, Maksim detected it.
As far your comment (2), I believe if Knox is rewriting the content, it
should rewrite the Content-Length ideally. But, it is not going to be
practical. Needs some research on how to fix the problem right.
Thanks
Dilli
On Fri, Oct 25, 2013 at 9:01 AM, Kevin Minder
<[email protected]>wrote:
I was afraid that Knox might actually be removing the Content-Length
header. Dilli is going to yell at me about that BTW!
So there are two things that need to be done.
1) Determine the client (e.g. curl) behavior when Content-Length is
specified.
2) Make changes in Knox so that the Content-Length response header is only
removed if the body is being rewritten.
Please file a jira for #2. I've already given this some thought so I can
add detail.
On 10/25/13 11:55 AM, Maksim Kononenko wrote:
On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder
<[email protected]>**wrote:
Maksim,
Great work!
Discussion inline below.
Recommended next steps.
1) Add the setup steps required to get all of this working to the user's
guide. File a jira.
2) Figure out a way to automate these tests. Might be hard on Apache
infra.
Kevin.
On 10/25/13 8:55 AM, Maksim Kononenko wrote:
Hi guys,
I was researching/testing Knox HA with Apache HTTP Server + mod_proxy +
mod_proxy_balancer.
Here is what I found.
I. 3 load balancer scheduler algorithms available for use: Request
Counting, Weighted Traffic Counting and Pending Request Counting. (
http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
html#scheduler<http://httpd.**apache.org/docs/2.2/mod/mod_**
proxy_balancer.html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
)
II. Load balancer stickyness. (
http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
html#stickyness<http://httpd.**apache.org/docs/2.2/mod/mod_**
proxy_balancer.html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
**>
)
I configured and tested stickyness. Worked as it had to be.
III. Failover. (
http://httpd.apache.org/docs/****2.2/mod/mod_proxy.html#****proxypass<http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass>
<http://httpd.apache.**org/docs/2.2/mod/mod_proxy.**html#proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
)
1. I ran foolowing use cases:
a) Knox instance is down before client request comes in.
Steps:
- Configure Apache HTTP Server to proxy two Knox
instances;
- Shoot down Knox instance A;
- Execute client request;
- Verify that Knox instance A is marked as unavailable
and
client's request is redirected to Knox instance B;
- Verify that all subsequent requests in scope of the
same
client's session are passed just to Knox instance B;
- Verify that client's requests in scope of new
session
are
tried to be passed to Knox instance A.
It is required because Knox instance A could be
started
before new client's session.
This seems a little sub-optimal to me but there may be nothing we can
do
about it.
The issue that I have is that I don't think Apache should be trying
instance-A first every time in this case.
So the question is how is Apache distributing load over instance-A and
instance-B?
Does it always try instance-A first or does it sometimes try instance-B
first?
In addition if it gets a failure for instance-A ideally it would take it
out of the "pool" for some (ideally configurable) period of time.
It depends on the load balancer scheduler algorithm. For my tests I used
Request Counting.
I'll look for any configuration related to take out of the "pool" time.
This use case works fine.
b) Knox instance goes down when it processes client's PUT
request.
Steps:
- Start executing PUT file to HDFS with medium size
(200Mb);
- After some time shoot down Knox instance which
processes
this request;
- Verify that client gets 500 status code and no
failover
takes place.
This use case works as it is described. Apache HTTP
Server is
not able to do failover in this case.
c) Knox instance goes down when it processes client's GET
request.
Steps:
- Start executing GET file from HDFS with medium size
(200Mb);
- After some time shoot down Knox instance which
processes
this request;
- Verify that client gets 200 status code,
'Content-Length'
header with value equals to file size and some bytes in the body.
To execute this test I used as a client:
1) HttpClient - it doesn't produce any error when
stream is closed.
2) CURL - it doesn't produce any error when
stream is
closed.
3) Firefox browser - it doesn't produce any error
when
stream is closed.
All clients just download available bytes before
stream
is closed, so client has to manually compare 'Content-Length' header
value
and received bytes length.
- No failover takes place.
This use case works as it is described. Apache HTTP
Server is
not able to do failover in this case.
This is unexpected and unfortunate.
I would have hoped that HttpClient and cURL at least would provide some
indication that the stream was incomplete according to the Content-Length
header.
The only thing I would recommend you trying is taking Knox out of the
picture, use cURL to GET the same file directly from HDFS, kill the
DataNode halfway through the stream and ensure that you see the same
behavior on the client side.
I just rechecked all headers/data and found that I was wrong about
Content-Length header. Knox received this header from DN but it didn't
send
it to client. I misunderstood a little bit logs on the Knox side.
I ran tests against DN usign CURL and it wrote "curl: (18) transfer closed
with 107092406 bytes remaining to read" when I stopped DN.
2. Additional use cases.
What new cases could you advise?
I just want to confirm that you have tested a scenario for HDFS where
the
call to the NameNode goes to instance-A and the subsequent call to the
DataNode goes to instance-B and this works.
IV. What functionality did I miss?
Other than the note above I don't see anything missing.
Maksim.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is
confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.