Now that HTTP/2 priorities are no longer crashing, we did some short 
experiments activating in production.  However, that resulted in some 
customer-side latency problems which I have been digging into through the past 
week.  I have been concentrating on exercising ATS via a continuous streaming 
site.  I have been taking advantage for browser developer hooks to simulate 
slower network speeds.

I found a paper which gave a nice overview of browser priority strategies 
https://www.researchgate.net/publication/324514529_HTTP2_Prioritization_and_its_Impact_on_Web_Performance
 . Chrome does a chain of priorities (first come, first served).  The session 
window limit seems to be hit first.  Firefox uses priority holder nodes to set 
up 4 or 5 buckets.  Safari seems to do a naive round robin approach where all 
priority nodes are direct children of the root.  My testing has concentrated on 
Firefox and Chrome.

The PR adds debug messages to dump the state of the priority tree as JSON 
output which has been very useful for tracking down these issues.

This PR has fixes for two problems.

* For Chrome, I would occasionally see the priority chain be broken.  
Originally, the new direct descendent would be a shadow node.  Then I took out 
the shadow node addition unless the parent_id was in the future and it would 
just be that new stream as the direct root descendent.  Since the point value 
of the new node was lower than the others, it would unfairly get a higher 
priority of the bandwidth (based on the weighted-fair queuing logic) starving 
out the older nodes and violating Chrome's first come first served policy.  The 
problem is that the parent of the new node got finished and removed from the 
priority tree before the stream id request appeared.  I saw this mostly with a 
high bandwidth client.  I addressed this issue by adding a small circular 
buffer of ancestors, so if the parent is missing, we can look to see where that 
node originally plugged in.  It isn't perfect since we are not keeping a 
complete history, but once I made this change, I no longer saw the problem.  I 
 think in practice you will only have one or two parents missing before the 
next stream request comes.  

* For Firefox, we are inserting nodes in the priority tree for priority frames, 
but we were treating them as shadow nodes.  So once their last direct 
descendent was deleted, the priority node was also deleted losing the buckets 
for future requests on the session.  This is not desirable for the Firefox 
strategy since we are relying on the priority direct root descendants as weight 
buckets.  So rather than using the fact that there is no associated Http2Stream 
node to identify a priority node as a shadow (true for shadows and priority 
frames), I added another flag to explicitly mark a node as a shadow.

I am still a bit unclear on the use of shadow nodes.  I did some searching 
based on our original behavior of creating shadow nodes for unknown parent and 
found  this discussion https://github.com/http2/http2-spec/issues/764 where 
Lukasa declares that the right way to do this is to make the new stream for the 
missing parent and do default priority processing. But in their example, they 
have stream 6 depending on stream 7 (presumably a future stream). I looked into 
how nghttp2 does it (which is declared to be the right way) and indeed they 
create a new node for the missing parent with default weight and root it at the 
root node.  Based on that discussion, I am assuming that shadow nodes are only 
to represent nodes that will come in the future (e.g. parent id > new stream 
id).

We are still in the process of testing these changes, but I wanted to get more 
eyes on this.  Particularly since I was not involved in the original priority 
code development.

I also need to augment the unit tests to model the chrome and firefox 
scenarios.  I hope to test some android apps.  It would also be good to test 
some iOS apps, but I don't have access to an iOS device.

[ Full content available at: https://github.com/apache/trafficserver/pull/4212 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to