Agreed.
Dilli

On Fri, Oct 25, 2013 at 9:23 AM, Kevin Minder
<[email protected]>wrote:

> I believe that since the Content-Length is a header that is written before
> the body is rewritten that the best we can do is avoid removing the
> Content-Length header when we know that we will not be rewriting the body.
>
>
> On 10/25/13 12:09 PM, Dilli Arumugam wrote:
>
>> Kevin,
>>
>> I should have done some tests and detected Content-Length is not reaching
>> the client.
>> Good,  Maksim detected it.
>>
>> As far your comment (2), I believe if Knox is rewriting the content, it
>> should  rewrite the Content-Length ideally. But, it is not going to be
>> practical. Needs some research on how to fix the problem right.
>>
>> Thanks
>> Dilli
>>
>>
>> On Fri, Oct 25, 2013 at 9:01 AM, Kevin Minder
>> <[email protected]>**wrote:
>>
>>  I was afraid that Knox might actually be removing the Content-Length
>>> header.  Dilli is going to yell at me about that BTW!
>>>
>>> So there are two things that need to be done.
>>>
>>> 1) Determine the client (e.g. curl) behavior when Content-Length is
>>> specified.
>>>
>>> 2) Make changes in Knox so that the Content-Length response header is
>>> only
>>> removed if the body is being rewritten.
>>>
>>> Please file a jira for #2.  I've already given this some thought so I can
>>> add detail.
>>>
>>>
>>> On 10/25/13 11:55 AM, Maksim Kononenko wrote:
>>>
>>>  On Fri, Oct 25, 2013 at 4:42 PM, Kevin Minder
>>>> <[email protected]>****wrote:
>>>>
>>>>   Maksim,
>>>>
>>>>> Great work!
>>>>> Discussion inline below.
>>>>> Recommended next steps.
>>>>> 1) Add the setup steps required to get all of this working to the
>>>>> user's
>>>>> guide.  File a jira.
>>>>> 2) Figure out a way to automate these tests.  Might be hard on Apache
>>>>> infra.
>>>>> Kevin.
>>>>>
>>>>>
>>>>> On 10/25/13 8:55 AM, Maksim Kononenko wrote:
>>>>>
>>>>>   Hi guys,
>>>>>
>>>>>> I was researching/testing Knox HA with Apache HTTP Server +
>>>>>>  mod_proxy +
>>>>>> mod_proxy_balancer.
>>>>>> Here is what I found.
>>>>>> I.   3 load balancer scheduler algorithms available for use: Request
>>>>>> Counting, Weighted Traffic Counting and Pending Request Counting. (
>>>>>> http://httpd.apache.org/docs/******2.2/mod/mod_proxy_balancer.****<http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**>
>>>>>> <http://httpd.apache.org/**docs/**2.2/mod/mod_proxy_**balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>>>> >
>>>>>> html#scheduler<http://httpd.****apache.org/docs/2.2/mod/mod_**<http://apache.org/docs/2.2/mod/mod_**>
>>>>>> proxy_balancer.html#scheduler<**http://httpd.apache.org/docs/**
>>>>>> 2.2/mod/mod_proxy_balancer.**html#scheduler<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler>
>>>>>> >
>>>>>>
>>>>>> )
>>>>>> II.  Load balancer stickyness. (
>>>>>> http://httpd.apache.org/docs/******2.2/mod/mod_proxy_balancer.****<http://httpd.apache.org/docs/****2.2/mod/mod_proxy_balancer.**>
>>>>>> <http://httpd.apache.org/**docs/**2.2/mod/mod_proxy_**balancer.**<http://httpd.apache.org/docs/**2.2/mod/mod_proxy_balancer.**>
>>>>>> >
>>>>>> html#stickyness<http://httpd.****apache.org/docs/2.2/mod/mod_****<http://apache.org/docs/2.2/mod/mod_**>
>>>>>> proxy_balancer.html#**stickyness<http://httpd.**
>>>>>> apache.org/docs/2.2/mod/mod_**proxy_balancer.html#stickyness<http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness>
>>>>>> **>
>>>>>> **>
>>>>>>
>>>>>> )
>>>>>>         I configured and tested stickyness. Worked as it had to be.
>>>>>> III. Failover. (
>>>>>> http://httpd.apache.org/docs/******2.2/mod/mod_proxy.html#******
>>>>>> proxypass<http://httpd.apache.org/docs/****2.2/mod/mod_proxy.html#****proxypass>
>>>>>> <http://httpd.apache.**org/docs/**2.2/mod/mod_proxy.**
>>>>>> html#**proxypass<http://httpd.apache.org/docs/**2.2/mod/mod_proxy.html#**proxypass>
>>>>>> >
>>>>>> <http://httpd.apache.**org/**docs/2.2/mod/mod_proxy.**html#**
>>>>>> proxypass<http://httpd.apache.**org/docs/2.2/mod/mod_proxy.**
>>>>>> html#proxypass<http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass>
>>>>>> >
>>>>>>
>>>>>> )
>>>>>>         1. I ran foolowing use cases:
>>>>>>            a) Knox instance is down before client request comes in.
>>>>>>                Steps:
>>>>>>                    - Configure Apache HTTP Server to proxy two Knox
>>>>>> instances;
>>>>>>                    - Shoot down Knox instance A;
>>>>>>                    - Execute client request;
>>>>>>                    - Verify that Knox instance A is marked as
>>>>>> unavailable
>>>>>> and
>>>>>> client's request is redirected to Knox instance B;
>>>>>>                    - Verify that all subsequent requests in scope of
>>>>>> the
>>>>>> same
>>>>>> client's session are passed just to Knox instance B;
>>>>>>                    - Verify that client's requests in scope of new
>>>>>> session
>>>>>> are
>>>>>> tried to be passed to Knox instance A.
>>>>>>                      It is required because Knox instance A could be
>>>>>> started
>>>>>> before new client's session.
>>>>>>
>>>>>>   This seems a little sub-optimal to me but there may be nothing we
>>>>>> can
>>>>>>
>>>>> do
>>>>> about it.
>>>>> The issue that I have is that I don't think Apache should be trying
>>>>> instance-A first every time in this case.
>>>>> So the question is how is Apache distributing load over instance-A and
>>>>> instance-B?
>>>>> Does it always try instance-A first or does it sometimes try instance-B
>>>>> first?
>>>>> In addition if it gets a failure for instance-A ideally it would take
>>>>> it
>>>>> out of the "pool" for some (ideally configurable) period of time.
>>>>>
>>>>>  It depends on the  load balancer scheduler algorithm. For my tests I
>>>> used
>>>> Request Counting.
>>>> I'll look for any configuration related to take out of the "pool" time.
>>>>
>>>>                  This use case works fine.
>>>>
>>>>>            b) Knox instance goes down when it processes client's PUT
>>>>>> request.
>>>>>>                Steps:
>>>>>>                    - Start executing PUT file to HDFS with medium size
>>>>>> (200Mb);
>>>>>>                    - After some time shoot down Knox instance which
>>>>>> processes
>>>>>> this request;
>>>>>>                    - Verify that client gets 500 status code and no
>>>>>> failover
>>>>>> takes place.
>>>>>>                This use case works as it is described. Apache HTTP
>>>>>> Server is
>>>>>> not able to do failover in this case.
>>>>>>            c) Knox instance goes down when it processes client's GET
>>>>>> request.
>>>>>>                Steps:
>>>>>>                    - Start executing GET file from HDFS with medium
>>>>>> size
>>>>>> (200Mb);
>>>>>>                    - After some time shoot down Knox instance which
>>>>>> processes
>>>>>> this request;
>>>>>>                    - Verify that client gets 200 status code,
>>>>>> 'Content-Length'
>>>>>> header with value equals to file size and some bytes in the body.
>>>>>>                      To execute this test I used as a client:
>>>>>>                        1) HttpClient - it doesn't produce any error
>>>>>> when
>>>>>> stream is closed.
>>>>>>                        2) CURL - it doesn't produce any error when
>>>>>> stream is
>>>>>> closed.
>>>>>>                        3) Firefox browser - it doesn't produce any
>>>>>> error
>>>>>> when
>>>>>> stream is closed.
>>>>>>                      All clients just download available bytes before
>>>>>> stream
>>>>>> is closed, so client has to manually compare 'Content-Length' header
>>>>>> value
>>>>>> and received bytes length.
>>>>>>                    - No failover takes place.
>>>>>>                This use case works as it is described. Apache HTTP
>>>>>> Server is
>>>>>> not able to do failover in this case.
>>>>>>
>>>>>>   This is unexpected and unfortunate.
>>>>>>
>>>>> I would have hoped that HttpClient and cURL at least would provide some
>>>>> indication that the stream was incomplete according to the
>>>>> Content-Length
>>>>> header.
>>>>> The only thing I would recommend you trying is taking Knox out of the
>>>>> picture, use cURL to GET the same file directly from HDFS, kill the
>>>>> DataNode halfway through the stream and ensure that you see the same
>>>>> behavior on the client side.
>>>>>
>>>>>  I just rechecked all headers/data and found that I was wrong about
>>>> Content-Length header. Knox received this header from DN but it didn't
>>>> send
>>>> it to client. I misunderstood a little bit logs on the Knox side.
>>>> I ran tests against DN usign CURL and it wrote "curl: (18) transfer
>>>> closed
>>>> with 107092406 bytes remaining to read" when I stopped DN.
>>>>
>>>>           2. Additional use cases.
>>>>
>>>>>            What new cases could you advise?
>>>>>>
>>>>>>   I just want to confirm that you have tested a scenario for HDFS
>>>>>> where
>>>>>>
>>>>> the
>>>>> call to the NameNode goes to instance-A and the subsequent call to the
>>>>> DataNode goes to instance-B and this works.
>>>>>
>>>>>    IV. What functionality did I miss?
>>>>> Other than the note above I don't see anything missing.
>>>>>
>>>>>   Maksim.
>>>>>
>>>>>>
>>>>>>   --
>>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity
>>>>> to which it is addressed and may contain information that is
>>>>> confidential,
>>>>> privileged and exempt from disclosure under applicable law. If the
>>>>> reader
>>>>> of this message is not the intended recipient, you are hereby notified
>>>>> that
>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>> received this communication in error, please contact the sender
>>>>> immediately
>>>>> and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>  --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is
>>> confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>>> immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to