RFC: nonblocking rewrite of ap_core_output_filter

Brian Pane Sat, 24 Sep 2005 22:59:46 -0700

I just committed a new version of ap_core_output_filter() to theasync-dev branch.The semantics of the filter have changed in a fundamental way: theoriginal versiondoes blocking writes and usually tries to write out all of theavailable data beforewriting (except in cases where it buffers data to avoid smallwrites), whereas this

new version does nonblocking writes in most cases.

The goal of the rewrite was to set the stage for a cleanimplementation of

asynchronous write completion in the Event and/or Leader/Followers MPMs.

The nonblocking behavior, however, appears to be useful in all MPMs.For

example, nonblocking writes can enable mod_include to parse the next

bunch of output while awaiting an ack from the client. To avoidinfinitememory usage, the new filter does blocking writes if it has >= 64KBof data

buffered up.

There are some significant differences from the earlier nonblockingoutputfilter that I posted a few weeks ago as part of the Event MPM asyncwrite

completion patch:

- If a nonblocking write attempt results in EAGAIN, the new filterreturns

  APR_SUCCESS instead of APR_EAGAIN.  The old patch broke too much

existing code that wasn't prepared for EAGAIN. The new filter canreturn

  APR_SUCCESS without having actually written the entire brigade, but

that's also true of the original ap_core_output_filter(), with itsvarious

  setaside cases.

- The new filter doesn't try to ignore flush buffers like the earlierpatch did.Instead, when it encounters a flush bucket, it does a blockingwrite ofeverything up to that point. The earlier patch tried to detectcertain

  patterns of buckets involving flush, EOS, and EOC that could be

interpreted as "hand this data off to the write completion threadinsteadof actually writing it out immediately." But that logic was toobrittle, as

  it depended on knowledge of the bucket patterns that the httpd core
  just happens to produce.  In the new design, the core output filter

interprets a flush bucket as "write this data out beforereturning." To

  implement async write completion on top of this, we'll likely have to
  remove some of the points in the core that generate flush buffers--
  but that's a project for another day.

There are a few things missing in the new version:

- It doesn't concatenate sequences of really small buckets togetherthe waythe original does. If you send it a brigade containing 16 single-byte buckets,it will do a writev of 16 bytes. My inclination is to leave this"broken" in orderto keep the code simple, unless someone has a real-world use casethat

  produces such output.

- I haven't yet put in mod_logio support. This shouldn't bedifficult to add, butI want to do some experiments with a new design: sending an End-Of-Requestbucket that calls the request logger when all the buckets in frontof it havebeen written to the network. If this works, the "EOR" bucket'sdestroy function

  might end up being the cleanest place to call the logio hooks.

- It doesn't yet do nonblocking reads on socket buckets. Can anyonerecommend

  a good test case that make use of socket buckets?

Thanks,
Brian

RFC: nonblocking rewrite of ap_core_output_filter

Reply via email to