Revision: 14432
Author: adrian.chadd
Date: Tue Feb 23 19:31:22 2010
Log: Created wiki page through web user interface.
http://code.google.com/p/lusca-cache/source/detail?r=14432
Added:
/wiki/ProjectAsyncReadCopy.wiki
=======================================
--- /dev/null
+++ /wiki/ProjectAsyncReadCopy.wiki Tue Feb 23 19:31:22 2010
@@ -0,0 +1,48 @@
+#summary Eliminating a memcpy() when reading data from disk
+
+= Introduction =
+
+Because of the architecture of Squid/Lusca, a temporary buffer is
allocated and used to handle read requests from the disk. Data is then
provided to the caller when the IO completes.
+
+This copy becomes a significant overhead for workloads which use the
operating system disk cache quite heavily for high-hit workloads.
+
+This project attempts to eliminate the copy by, at the moment, enforcing
that the callback will always remain valid until the IO is completed or
cancelled.
+
+= Overview =
+
+The Squid/Lusca codebase makes extensive use of a reference-counted object
type called "cbdata". This ensures that callbacks are only made
with "valid" callback data. Unfortunately this is used as an early abort
mechanism by a large pat of the codebase.
+
+Thus, if an asynchronous event is scheduled which uses a callback data
pointer and/or any of the memory it points to (say, a memory buffer to read
data into) there is no guarantee that the data buffer will remain valid for
the duration of the event. In the case of disk IO reads, the kernel may be
in the process of handling a read() call into the buffer in an IO thread
whilst the main Squid/Lusca thread aborts the connection and free'ing /
reusing the underlying memory buffer.
+
+= Current Method =
+
+The current approach mirrors various underlying operating system support.
This aims to be a "springboard" to layer further abstractions on top of
later on as needed but does not necessarily lock the codebase into a
specific paradigm. It is also currently the riskiest!
+
+Another aim is to evaluate what is required to implement this change for
other asynchronous events with a future goal of allowing cbdata to be
properly shared between active threads. This would allow for further
processing in other threads (eg, URL rewriting, content rewriting, etc)
without requiring an intermediate copy step to ensure data remains valid
for the duration of the asynchronous event.
+
+The caller must supply a callback+cbdata AND buffer which will remain
valid for the duration of the read event. The store client, which is
effectively the callback for the read IO mechanisms, must now remain valid
until the IO completes or is explicitly cancelled.
+
+The store client now tracks whether it is active or not.
storeClientUnregister() now doesn't free the store client; it marks it as
inactive. Callbacks are then responsible for calling storeClientComplete()
to check whether the store client is done - and if so, the callback is
aborted.
+
+= Risks with the current method =
+
+The codebase is a very large maze of twisty passages, all alike. There's
more involved in the read path than just straight reference counting - for
example, the general store disk IO stuff involves both the store client
and storeIOState as part of the callback data for various events - and
this will likely need similar separation and treatment.
+
+The aioCancel() path needs further testing to fully .. well, fix. It still
isn't completely clear.
+
+A few of the callbacks will call storeClientComplete() to check whether
they need to be freed, and then abort the function if the store client
isn't active. I'm not entirely sure why a callback will be called on a
in-progress but not-active callback and this requires further
investigation. (In reality, I've forgotten why I wrote this in the past and
need to fully map out what's going on - then comment things! - before I'm
satisfied with it.)
+
+Mapping out all of the possible interactions with store client and the
storeIOState would be very, very helpful in this.
+
+storeClientComplete() shouldn't do the checks AND free things. They should
be separated out for clarity.
+
+= Alternative Approaches =
+
+One alternative is to refcount the IO buffer separately from the cbdata
for the callback. The IO layer can then increase the buffer reference, read
into it, and then release the reference once the IO completes. If the
callback data has been freed then the callback will not be called but the
IO buffer will still be valid for the duration of the IO.
+
+Another alternative is to request filled in pages from the IO layer - so
instead of the caller supplying a destination buffer, the caller simply
states what size the buffer should be, and is handed memory page(s) with
the relevant data.
+
+= Development Links =
+
+* Branch:
[http://code.google.com/p/lusca-cache/source/list?path=/playpen/LUSCA_HEAD_zerocopy_storeread]
+* Diff against LUSCA_HEAD (r14431):
[http://www.creative.net.au/diffs/LUSCA_HEAD_zerocopy_storeread.r14431.diff]
--
You received this message because you are subscribed to the Google Groups
"lusca-commit" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/lusca-commit?hl=en.