Re: [AOLSERVER] Data "corruption" with fastpath caching

Jim Davidson Wed, 20 Aug 2008 11:38:22 -0700

Hmm... I may be confused and have to re-read all the past messages(except I've deleted them) but my understanding was it was rapid re-use of inodes within the 1-second resolution of mtime and the samefile size, all confusing the cache code that caused the problem.

With the fix below with resolution for mtime at 1 second and the"grace period" at 2 seconds I can see how it would work but it wouldmake me a bit queasy -- fixes which have assumptions of timing can befragile. With filename-based keys and unique filenames (which wouldseem like a natural requirement from someone writing similar code),the inode would be ignored and you'd get a consistent view, regardlessof timing or what other threads would be up to. I think you could trythis approach as quickly as your fix -- just define _WIN32 after the"include nsd.h" to get the filename behavior of Win32. You could runyour test and see if it was stable as well -- I'd be curious.

Again, this whole issue is interesting and the problem report quitesubtle, justifying some sort of defensive fix but using ns_returnfilefor short, dynamic content still seems like the wrong approach. Ideasto use a cache of open fd's via Ns_GetTemp or Tcl channels vians_returnfp seem closer to what's needed here.

BTW: Which OS is re-using inodes so quickly? I can't get my Mac OS/Xlaptop to do that -- figured the inode re-use/prediction thing wasplugged years ago, e.g., when fsirand was introduced for scramblingNFS vnodes.


-Jim




On Aug 20, 2008, at 12:54 PM, John Caruso wrote:

On Wednesday 08:45 AM 8/20/2008, Jim Davidson wrote:
Overall, it seems one thing to do would be to switch to filename-based
cache keys by default, leaving the dev/inode pair as an option for
folks who run sites with large symlinks and want to benefit from
caching objects just once.  I think that should avoid the data
corruption cases John pointed out with minimal downside.
Actually that wouldn't have fixed the problem in the code that ledus to find out about this in the first place. The change that Isuggested does fix that problem, though, and it directly addressesthe limitation of mtime's one-second granularity--which is the cruxof the issue. The patch below (really just a one-line change)implements this fix:
------- 8<------------------------------------------------------------------------ aolserver-4.5.0-orig/nsd/fastpath.c 2006-04-1910:48:47.000000000 -0700+++ aolserver-4.5.0/nsd/fastpath.c 2008-08-1921:22:26.000000000 -0700
@@ -507,9 +507,11 @@
    }

    if (servPtr->fastpath.cache == NULL
-           || stPtr->st_size > servPtr->fastpath.cachemaxentry) {
+           || stPtr->st_size > servPtr->fastpath.cachemaxentry
+           || (time(NULL) - stPtr->st_mtime) < 2) {
       /*
-        * Caching is disabled or the entry is too large for the cache
+        * Caching is disabled, the entry is too large for the cache,
+        * or the file was modified too recently to be cached safely,
        * so just open, mmap, and send the content directly.
        */
------- 8<---------------------------------------------------------------------
We've tested this fix extensively against the code that was hittingthe bug before, and I can verify that it resolves the problemthere. As far as I can tell this fix would resolve the issue in anystandard scenario (and certainly all of the ones I've outlined thusfar).
Given that this is a straightforward, user-transparent change thatwould have only a negligible impact on fastpath caching, andconsidering the security implications, I'd suggest that this changebe applied to the AOLserver code in CVS.
BTW, Jeff, the scenario you'd outlined that you thought would tripthis up...:
  13:50:21 - create file
  13:50:21 - serve file (gets cached)
  13:50:21 - delete file
  13:50:21 - create file again (reuses inode)
  ... time passes ...
  13:55:11 - serve file
...actually wouldn't, because the file would NOT be cached in thesecond line. The whole point of this strategy is that a file won'tbe cached if it's been modified within the threshold time (2 secondsin the patch above).
- John


--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with thebody of "SIGNOFF AOLSERVER" in the email message. You can leave theSubject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Re: [AOLSERVER] Data "corruption" with fastpath caching

Reply via email to