Re: [AOLSERVER] Data corruption with fastpath caching
Ah -- I (finally) understand... I must have missed the detail re: serialization in message #30 out of #60 or so So, this clarifies to me: -- cache by filename key is correct and good for most cases and should be on by default -- the grace period is a clever solution for the rapid-changing, same filename case you described and deserves to be on by default -- ns_returnfile shouldn't use the cache but already does -- some config and/or command flags can be added to toggle the behavior I'll update the code with the options above. -Jim On Aug 21, 2008, at 11:27 PM, John Caruso wrote: On Thursday 02:34 PM 8/21/2008, Jim Davidson wrote: To clarify one point: There is no technical solution to creating temp files with the same name and avoiding the race condition without additional synchronization. To clarify as well: the original code didn't involve a race condition--it was effectively serialized, as though it were like this snippet: foreach object $objects { eval exec /some/external/program --output-file $tempfile -- object $object ns_returnfile 200 text/plain $tempfile } (As I mentioned to you, this was basically a batch process driven by a client-side Java applet making sequential HTTP requests to an AOLserver-driven API web server, one transaction at a time, with the results being returned by ns_returnfile on the server. Also, the temp file in question was in a secure directory.) So the bug can (and did) manifest itself with serialized access. So, here's what I'd suggest: -- Cache by filename key should be the default. This is technically the correct fix to enable temporary, uniquely named files, to be returned via ns_returnfile. -- John's grace period code is a clever optimization if fastpath is being used in this way and could also be an option, default off. Again, this wouldn't have resolved Arena's initial problem; the original code would still have hit the bug, and it would have been just as difficult to detect that that was happening (though slightly easier to debug). That's why I'd recommend having the mtime workaround code active with a default of 1--otherwise people running a default config of AOLserver will still be open to the same issue. (That's my only stake in this, BTW; Arena is already using the mtime fix and will continue to do so, but I'd really rather not have someone else run into this issue in the future.) In thinking about it today I realized that it's useful to think about the four scenarios in which the bug can currently occur (which I believe partition the bug space): 1) Monotonically increasing time with a different filename 2) Monotonically increasing time with the same filename 3) Time travelling with a different filename 4) Time travelling with the same filename (Time travelling here means mucking with the mtime artifically, a la rsync, and filename means fully-qualified filename.) The mtime workaround resolves scenarios 1 and 2, and using the filename as the cache key resolves scenarios 1 and 3. Nothing suggested so far resolves scenario 4--and in fact I don't think it's possible to resolve scenario 4 short of a major rewrite of the code (like Juan's suggestion of using inotify or similar functionality). So combining both fixes resolves all of the resolvable issues. For reference: the bug occurred in scenario 2, and subsequently in scenario 1. And the security implications apply to all four scenarios, though they're arguably worst in scenarios 1 and 3. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Jim, Can I ask why the filename is important for the cache key? With the cache delay, the inode/dev + *time + size should do it all. In fact, I finally understood the difference between mtime and ctime, if any change is made, it should be the change to ctime. Why ctime? ctime is unique in that it isn't something that can be set by user level programs. It changes whenever the content of the file changes or permissions, owners, or any of the metadata of the file. So, for instance, if someone replaces a file with an identical file, the ctime would still change. If you check the ctime, you can also skip checking the size. But none of this has to do with the filename. On Unix, filenames are especially squishy. both stat and open follow symbolic links, and you could therefore use a symlink to point to different files over time, but the files could have identical mtime, atime, size, owner, etc. using stat/open and comparing with what is in cache based upon the filename would never detect this slight of hand. This is why the inode is most important on unix. On windows, you can't do this, so filenames are safe. My recommendation for changes are these: 1. use John's ingenious 1-2 second delay, optionally allow config of the cache delay. 2. base the age for the above on the ctime, not the mtime. ctime is always younger or the same age as mtime, and covers changes to metadata, and is immune from easy modification. Maybe: remove the fastpath config options from the basic config file, if it is even there, other example configs could be set to cache = off. Happy Weekend everyone. tom jackson On Fri, 2008-08-22 at 13:18 -0400, Jim Davidson wrote: Ah -- I (finally) understand... I must have missed the detail re: serialization in message #30 out of #60 or so So, this clarifies to me: -- cache by filename key is correct and good for most cases and should be on by default -- the grace period is a clever solution for the rapid-changing, same filename case you described and deserves to be on by default -- ns_returnfile shouldn't use the cache but already does -- some config and/or command flags can be added to toggle the behavior I'll update the code with the options above. -Jim On Aug 21, 2008, at 11:27 PM, John Caruso wrote: On Thursday 02:34 PM 8/21/2008, Jim Davidson wrote: To clarify one point: There is no technical solution to creating temp files with the same name and avoiding the race condition without additional synchronization. To clarify as well: the original code didn't involve a race condition--it was effectively serialized, as though it were like this snippet: foreach object $objects { eval exec /some/external/program --output-file $tempfile -- object $object ns_returnfile 200 text/plain $tempfile } (As I mentioned to you, this was basically a batch process driven by a client-side Java applet making sequential HTTP requests to an AOLserver-driven API web server, one transaction at a time, with the results being returned by ns_returnfile on the server. Also, the temp file in question was in a secure directory.) So the bug can (and did) manifest itself with serialized access. So, here's what I'd suggest: -- Cache by filename key should be the default. This is technically the correct fix to enable temporary, uniquely named files, to be returned via ns_returnfile. -- John's grace period code is a clever optimization if fastpath is being used in this way and could also be an option, default off. Again, this wouldn't have resolved Arena's initial problem; the original code would still have hit the bug, and it would have been just as difficult to detect that that was happening (though slightly easier to debug). That's why I'd recommend having the mtime workaround code active with a default of 1--otherwise people running a default config of AOLserver will still be open to the same issue. (That's my only stake in this, BTW; Arena is already using the mtime fix and will continue to do so, but I'd really rather not have someone else run into this issue in the future.) In thinking about it today I realized that it's useful to think about the four scenarios in which the bug can currently occur (which I believe partition the bug space): 1) Monotonically increasing time with a different filename 2) Monotonically increasing time with the same filename 3) Time travelling with a different filename 4) Time travelling with the same filename (Time travelling here means mucking with the mtime artifically, a la rsync, and filename means fully-qualified filename.) The mtime workaround resolves scenarios 1 and 2, and using the filename as the cache key resolves scenarios 1 and 3. Nothing suggested so far resolves scenario 4--and in fact I don't think it's possible to resolve scenario 4 short of a
Re: [AOLSERVER] Data corruption with fastpath caching
On Aug 21, 2008, at 8:14 AM, Dossy Shiobara wrote: I've remained silent on this issue because I didn't want to be accused of stifling the community, etc. ... End of discussion. Accused. Guilty. Don Baccus http://donb.photo.net http://birdnotes.net http://openacs.org -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
I think there has been more than enough discussion around this issue which I did not interfere with nor influence in any way. However, to let it continue to spin over and over is unproductive. I'd love to hear other solutions other than configurabe cache key strategies or a time-based delay caching strategy, but the time to debate whether this is a defect or not is officially over: it is a defect. Let's find the right technical solution to make fastpath more robust, please. --Original Message-- From: Don Baccus Sender: AOLserver Discussion To: AOLSERVER@LISTSERV.AOL.COM ReplyTo: AOLserver Discussion Sent: Aug 21, 2008 12:25 PM Subject: Re: [AOLSERVER] Data corruption with fastpath caching On Aug 21, 2008, at 8:14 AM, Dossy Shiobara wrote: I've remained silent on this issue because I didn't want to be accused of stifling the community, etc. ... End of discussion. Accused. Guilty. Don Baccus http://donb.photo.net http://birdnotes.net http://openacs.org -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- Dossy Shiobara [EMAIL PROTECTED] -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Thursday 08:14 AM 8/21/2008, Dossy Shiobara wrote: 4) I see the simplest (best?) solution here being a configurable parameter that controls fastpath's cache key generation. As Jim points out, one can quickly test whether this would solve the problem at hand by temporarily #define'ing _WIN32 in the appropriate place. I'm not sure if I've mentioned this on the list, but the initial case that prompted us to discover the bug would not have been helped by this change; the main difference is that it would have shortened the debugging efforts. And this change would also alter current behavior in a major way, by defeating the explicit goal of having hard-linked files served from the same cache entry. A fix that doesn't resolve the initial problem and which has undesirable side effects isn't worth pursuing. The change I offered is the only one that corrects the problem in the original code and all the other examples I've mentioned, transparently and without any serious side effects. Simply put, fastpath as designed should not be caching items that have been modified within the current second, because mtime's granularity (in combination with inode reuse) doesn't allow it to distinguish them from other items in the cache. Based on what Jim's said here, I'm guessing he wasn't thinking much about the mtime granularity issue because he thought inodes would be more than enough to ensure uniqueness, but that's not always the case (in fact it's usually *not* the case, since the two most widely-used filesystems reuse inodes). In thinking about this further, barring a complete rewrite I don't think the fastpath caching mechanism *can* be fixed completely--it can only be improved. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Dossy Shiobara wrote: However, to let it continue to spin over and over is unproductive. I'd love to hear other solutions other than configurabe cache key strategies or a time-based delay caching strategy, but the time to debate whether this is a defect or not is officially over: it is a defect. Let's find the right technical solution to make fastpath more robust, please. The solutions I've seen proposed are * configurable cache key + solves the problem for the most common and surprising case - does not solve the OP's problem ? what to make the default? * time-delay + solves OP's problem - still easily broken * flush files from cache with ns_unlink - easily defeated (by using different command to remove files) - links mostly unrelated commands * exclude some paths from caching (e.g., don't cache if the file is outside of pageroot, or is in some configured list of non-cached directories, like /tmp) + keeps temporary files from taking up cache memory - still easily broken - developer needs to know to use excluded paths * add a -nocache flag to ns_returnfile (or add a new command like ns_returntmpfile) to indicate that the file shouldn't be cached + makes intent clearer + keeps temp files out of cache - bloats api - doesn't fix existing cases * change fastpath default to off to be turned on only when desired and understood + simple, effective - default performance hit * use system-provided change notifications to flush cache (e.g., inotify) + effective, efficient - non-portable Did I leave any out? -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Thu, 2008-08-21 at 11:14 -0400, Dossy Shiobara wrote: 4) I see the simplest (best?) solution here being a configurable parameter that controls fastpath's cache key generation. As Jim points out, one can quickly test whether this would solve the problem at hand by temporarily #define'ing _WIN32 in the appropriate place. If this proves successful, we change it from using #ifdef's to regular if() statements and define a new configuration parameter. End of discussion. I have responded twice to John's newest patch idea, which is a one line patch. It appears to completely eliminate any problem with cache poisoning. It is simple, it doesn't change the semantics of the command or anything else. It simply works around a known limitation of the stat mtime granularity. The only security issue that was exposed was the misuse of ns_returnfile. All of the data put into cache were entirely under the control of the AOLserver process. The developer / maintainer of that process is responsible for everything the process does. ns_returnfile is an inherently dangerous API, there is no handholding involved. You have to understand what it is doing and why it exists. In fact, John even pointed out that the original code which wrote out the contents of the file reused the same name over and over. Assuming that you can know that the contents of a file have not changed just because it has the same name, same mtime and same size is an invalid assumption, it will always be invalid. All caches have the same limitation. By definition they are not in sync with the true copy. Anyone who uses a cache needs to understand this. So, this is important, John is not interested in the cache, he actually wants to avoid the cache. So talking about how stuff is stored in the cache, and under what key, is unimportant for John. He wants to keep his newly created file from ever getting into the cache. And this is where he has a point, a very good one. Why put newly created files into a cache, if the point of the cache is to handle static files? We can wait for evidence that it is static. In this case, we can wait until it is a few seconds old, at least. John's patch does exactly this and nothing more. It is actually a very ingenious change. There is no difference between the inode and the filename under unix. Both offer equal opportunity to screw up due to a race condition. It can still happen even in the patched ns_returnfile. Jim mentioned this. After a file is stat'ed, the open might find a different (maybe truncated) file. There is no guarantee that you won't get something else, especially if you have multiple processes/threads creating files in an non-synchronized way. It is not part of ns_returnfile to guarantee that the contents/age of a file remains unchanged during the course of execution, and when you throw in an external process it is nearly impossible to come up with any code which can provide that guarantee. If data integrity is really important to you, don't try to provide it using named files as temporary storage. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Thursday 09:25 AM 8/21/2008, Tom Jackson wrote: Why put newly created files into a cache, if the point of the cache is to handle static files? We can wait for evidence that it is static. In this case, we can wait until it is a few seconds old, at least. This is a very good point, actually. Initially I didn't think having the value be configurable was a good way to go--it should just be set to 1, since that's the threshold value (yeah, I used 2 out of an excess of caution, but 1 would have worked fine and would probably have created less confusion). But having this as a configurable (something like fastpath.min_cache_age?) would give people the ability to say: do NOT cache anything unless it's at least X seconds old. Which actually does seem like a genuinely useful thing on its own (completely independent of this issue), since it would allow people to prevent new/dynamic files from kicking good data out of the cache. And enforcing a minimum value of 1 would serve the dual purpose of resolving this problem as well. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
I agree that John's patch is worth doing. It satisfies both his requirements and the stated design goals of fastpath. The remaining issue is whether something called ns_returnfile which takes a pathname as a parameter should have some guarantee that you will return what at least at some point was the contents of a file with that pathname. It's perfectly acceptable in dealing with caching systems that the cached value could be out of sync, but not that the cached value could be for something entirely different from what you were looking for. Even with the mtime fix there's no guarantee that systems which muck around with mtime (such as tar) won't cause separate files to collide. For a contrived example: 1. tar xf foo.tar (creating two files a and b with the same size and same mtime) 2. ns_returnfile b 3. Delete files a and b 4. tar xf foo.tar 5. ns_returnfile b (this could return the contents of a because the inode was reused) I don't think this example violates any of the stated principles of using ns_returnfile for only static data. Both a and b could have completely stable contents and due to some minor issue of system administration (for example) their inodes could end up swapped and the cache poisoned. So I think we need both fixes, one to eliminate caching unless a certain criterion of static-ness has been met, and the other to prevent the cache from returning completely unrelated data. Other caveats about ns_returnfile use still apply, and the documentation should reflect them. Now the only people this wouldn't satisfy are those who are concerned about pathnames taking up space in the cache or slowing it down. The option has been suggested to make pathname inclusion optional, though I would advise against it unless the configuration option is named in such a way as to indicate its unsafe-ness. Titi Ala'ilima Lead Architect MedTouch LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 -Original Message- From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf Of Tom Jackson Sent: Thursday, August 21, 2008 12:25 PM To: AOLSERVER@LISTSERV.AOL.COM Subject: Re: [AOLSERVER] Data corruption with fastpath caching On Thu, 2008-08-21 at 11:14 -0400, Dossy Shiobara wrote: 4) I see the simplest (best?) solution here being a configurable parameter that controls fastpath's cache key generation. As Jim points out, one can quickly test whether this would solve the problem at hand by temporarily #define'ing _WIN32 in the appropriate place. If this proves successful, we change it from using #ifdef's to regular if() statements and define a new configuration parameter. End of discussion. I have responded twice to John's newest patch idea, which is a one line patch. It appears to completely eliminate any problem with cache poisoning. It is simple, it doesn't change the semantics of the command or anything else. It simply works around a known limitation of the stat mtime granularity. The only security issue that was exposed was the misuse of ns_returnfile. All of the data put into cache were entirely under the control of the AOLserver process. The developer / maintainer of that process is responsible for everything the process does. ns_returnfile is an inherently dangerous API, there is no handholding involved. You have to understand what it is doing and why it exists. In fact, John even pointed out that the original code which wrote out the contents of the file reused the same name over and over. Assuming that you can know that the contents of a file have not changed just because it has the same name, same mtime and same size is an invalid assumption, it will always be invalid. All caches have the same limitation. By definition they are not in sync with the true copy. Anyone who uses a cache needs to understand this. So, this is important, John is not interested in the cache, he actually wants to avoid the cache. So talking about how stuff is stored in the cache, and under what key, is unimportant for John. He wants to keep his newly created file from ever getting into the cache. And this is where he has a point, a very good one. Why put newly created files into a cache, if the point of the cache is to handle static files? We can wait for evidence that it is static. In this case, we can wait until it is a few seconds old, at least. John's patch does exactly this and nothing more. It is actually a very ingenious change. There is no difference between the inode and the filename under unix. Both offer equal opportunity to screw up due to a race condition. It can still happen even in the patched ns_returnfile. Jim mentioned this. After a file is stat'ed, the open might find a different (maybe truncated) file. There is no guarantee that you won't get something else, especially if you have multiple processes/threads creating files in an non-synchronized way. It is not part of ns_returnfile
Re: [AOLSERVER] Data corruption with fastpath caching
I don't have any opinion on the fix, but I think the actual objection to using the filename in the fix is that this would cause hard links to files, which are for all intents and purposes The Same File, to be considered different files by fastpath. (Hard links have different names, but the same inode) Rusty Titi Alailima wrote: I agree that John's patch is worth doing. It satisfies both his requirements and the stated design goals of fastpath. The remaining issue is whether something called ns_returnfile which takes a pathname as a parameter should have some guarantee that you will return what at least at some point was the contents of a file with that pathname. It's perfectly acceptable in dealing with caching systems that the cached value could be out of sync, but not that the cached value could be for something entirely different from what you were looking for. Even with the mtime fix there's no guarantee that systems which muck around with mtime (such as tar) won't cause separate files to collide. For a contrived example: 1. tar xf foo.tar (creating two files a and b with the same size and same mtime) 2. ns_returnfile b 3. Delete files a and b 4. tar xf foo.tar 5. ns_returnfile b (this could return the contents of a because the inode was reused) I don't think this example violates any of the stated principles of using ns_returnfile for only static data. Both a and b could have completely stable contents and due to some minor issue of system administration (for example) their inodes could end up swapped and the cache poisoned. So I think we need both fixes, one to eliminate caching unless a certain criterion of static-ness has been met, and the other to prevent the cache from returning completely unrelated data. Other caveats about ns_returnfile use still apply, and the documentation should reflect them. Now the only people this wouldn't satisfy are those who are concerned about pathnames taking up space in the cache or slowing it down. The option has been suggested to make pathname inclusion optional, though I would advise against it unless the configuration option is named in such a way as to indicate its unsafe-ness. Titi Ala'ilima Lead Architect MedTouch LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 -Original Message- From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf Of Tom Jackson Sent: Thursday, August 21, 2008 12:25 PM To: AOLSERVER@LISTSERV.AOL.COM Subject: Re: [AOLSERVER] Data corruption with fastpath caching On Thu, 2008-08-21 at 11:14 -0400, Dossy Shiobara wrote: 4) I see the simplest (best?) solution here being a configurable parameter that controls fastpath's cache key generation. As Jim points out, one can quickly test whether this would solve the problem at hand by temporarily #define'ing _WIN32 in the appropriate place. If this proves successful, we change it from using #ifdef's to regular if() statements and define a new configuration parameter. End of discussion. I have responded twice to John's newest patch idea, which is a one line patch. It appears to completely eliminate any problem with cache poisoning. It is simple, it doesn't change the semantics of the command or anything else. It simply works around a known limitation of the stat mtime granularity. The only security issue that was exposed was the misuse of ns_returnfile. All of the data put into cache were entirely under the control of the AOLserver process. The developer / maintainer of that process is responsible for everything the process does. ns_returnfile is an inherently dangerous API, there is no handholding involved. You have to understand what it is doing and why it exists. In fact, John even pointed out that the original code which wrote out the contents of the file reused the same name over and over. Assuming that you can know that the contents of a file have not changed just because it has the same name, same mtime and same size is an invalid assumption, it will always be invalid. All caches have the same limitation. By definition they are not in sync with the true copy. Anyone who uses a cache needs to understand this. So, this is important, John is not interested in the cache, he actually wants to avoid the cache. So talking about how stuff is stored in the cache, and under what key, is unimportant for John. He wants to keep his newly created file from ever getting into the cache. And this is where he has a point, a very good one. Why put newly created files into a cache, if the point of the cache is to handle static files? We can wait for evidence that it is static. In this case, we can wait until it is a few seconds old, at least. John's patch does exactly this and nothing more. It is actually a very ingenious change. There is no difference between the inode and the filename under unix. Both offer equal opportunity to screw up due to a race condition. It can still happen even in the patched ns_returnfile
Re: [AOLSERVER] Data corruption with fastpath caching
Titi Alailima wrote: what you were looking for. Even with the mtime fix there's no guarantee that systems which muck around with mtime (such as tar) won't cause separate files to collide. For a contrived example: I think the best you can do is to use ctime instead of mtime, or maybe btime on *bsd. You can still run into problems if you have clock skew, but there's only so much you can account for. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Right, I forgot that one. But the potential resolution is the same, allow a configurable unsafe mode. If you want the inode-only optimization and are willing to take on the resulting unpredictability of ns_returnfile, go right ahead. But the majority of developers who don't know or don't care will have a much safer experience. Titi Ala'ilima Lead Architect MedTouch LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 -Original Message- From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf Of Rusty Brooks Sent: Thursday, August 21, 2008 3:25 PM To: AOLSERVER@LISTSERV.AOL.COM Subject: Re: [AOLSERVER] Data corruption with fastpath caching I don't have any opinion on the fix, but I think the actual objection to using the filename in the fix is that this would cause hard links to files, which are for all intents and purposes The Same File, to be considered different files by fastpath. (Hard links have different names, but the same inode) Rusty -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
I'm lost. If you are interested in serving the same content, mtime tells you the last time the content was modified. ctime changes for reasons all unrelated to the content. But this is a cache, which is a copy. There is never any way to guarantee that the content is the same as what is currently on disk, unless you compare the files directly. Of course that totally negates the purpose of the cache. If this is what you want, you should disable the cache completely. BTW, if you just delete the fastpath config parameters from the config file fastpath will be disabled, so it is disabled by default right now. I'm also wondering if the inode/dev key just catches hard links. I think it also works via indirection with symlinks? Both stat and open follow symbolic links, so the inode is probably more stable than the filename on unix. Anyway, trying to guarantee anything about files is a lost cause. tom jackson On Thu, 2008-08-21 at 12:46 -0700, Jeff Rogers wrote: Titi Alailima wrote: what you were looking for. Even with the mtime fix there's no guarantee that systems which muck around with mtime (such as tar) won't cause separate files to collide. For a contrived example: I think the best you can do is to use ctime instead of mtime, or maybe btime on *bsd. You can still run into problems if you have clock skew, but there's only so much you can account for. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
with the same size and same mtime) 2. ns_returnfile b 3. Delete files a and b 4. tar xf foo.tar 5. ns_returnfile b (this could return the contents of a because the inode was reused) I don't think this example violates any of the stated principles of using ns_returnfile for only static data. Both a and b could have completely stable contents and due to some minor issue of system administration (for example) their inodes could end up swapped and the cache poisoned. So I think we need both fixes, one to eliminate caching unless a certain criterion of static-ness has been met, and the other to prevent the cache from returning completely unrelated data. Other caveats about ns_returnfile use still apply, and the documentation should reflect them. Now the only people this wouldn't satisfy are those who are concerned about pathnames taking up space in the cache or slowing it down. The option has been suggested to make pathname inclusion optional, though I would advise against it unless the configuration option is named in such a way as to indicate its unsafe-ness. Titi Ala'ilima Lead Architect MedTouch LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 -Original Message- From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf Of Tom Jackson Sent: Thursday, August 21, 2008 12:25 PM To: AOLSERVER@LISTSERV.AOL.COM Subject: Re: [AOLSERVER] Data corruption with fastpath caching On Thu, 2008-08-21 at 11:14 -0400, Dossy Shiobara wrote: 4) I see the simplest (best?) solution here being a configurable parameter that controls fastpath's cache key generation. As Jim points out, one can quickly test whether this would solve the problem at hand by temporarily #define'ing _WIN32 in the appropriate place. If this proves successful, we change it from using #ifdef's to regular if() statements and define a new configuration parameter. End of discussion. I have responded twice to John's newest patch idea, which is a one line patch. It appears to completely eliminate any problem with cache poisoning. It is simple, it doesn't change the semantics of the command or anything else. It simply works around a known limitation of the stat mtime granularity. The only security issue that was exposed was the misuse of ns_returnfile. All of the data put into cache were entirely under the control of the AOLserver process. The developer / maintainer of that process is responsible for everything the process does. ns_returnfile is an inherently dangerous API, there is no handholding involved. You have to understand what it is doing and why it exists. In fact, John even pointed out that the original code which wrote out the contents of the file reused the same name over and over. Assuming that you can know that the contents of a file have not changed just because it has the same name, same mtime and same size is an invalid assumption, it will always be invalid. All caches have the same limitation. By definition they are not in sync with the true copy. Anyone who uses a cache needs to understand this. So, this is important, John is not interested in the cache, he actually wants to avoid the cache. So talking about how stuff is stored in the cache, and under what key, is unimportant for John. He wants to keep his newly created file from ever getting into the cache. And this is where he has a point, a very good one. Why put newly created files into a cache, if the point of the cache is to handle static files? We can wait for evidence that it is static. In this case, we can wait until it is a few seconds old, at least. John's patch does exactly this and nothing more. It is actually a very ingenious change. There is no difference between the inode and the filename under unix. Both offer equal opportunity to screw up due to a race condition. It can still happen even in the patched ns_returnfile. Jim mentioned this. After a file is stat'ed, the open might find a different (maybe truncated) file. There is no guarantee that you won't get something else, especially if you have multiple processes/threads creating files in an non-synchronized way. It is not part of ns_returnfile to guarantee that the contents/age of a file remains unchanged during the course of execution, and when you throw in an external process it is nearly impossible to come up with any code which can provide that guarantee. If data integrity is really important to you, don't try to provide it using named files as temporary storage. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave
Re: [AOLSERVER] Data corruption with fastpath caching
On Thursday 12:08 PM 8/21/2008, Titi Alailima wrote: It's perfectly acceptable in dealing with caching systems that the cached value could be out of sync, but not that the cached value could be for something entirely different from what you were looking for. Yep. I think that aspect of the issue has been getting lost--it's not just about getting stale data from a given file, but getting data from an entirely *different* file, which I'd agree violates any reasonable expectation of a caching system. So I think we need both fixes, one to eliminate caching unless a certain criterion of static-ness has been met, and the other to prevent the cache from returning completely unrelated data. You make a good point. The fix I suggested is intended to make fastpath caching behave well in all cases where time proceeds monotonically (which I'd guess is by far the most common use case, especially for a web server that's unlikely to call utilities like tar/rsync that would munge file times). In fact that's essentially what I mean by pathological. But to protect against the time-travelling scenarios causing fastpath to confuse two different files, you'd have to use the filename-as-key fix as well. Using the filename as a key is a bummer for sites like AOL that want the cache to respect hard links when serving data from the cache, though. It won't matter for us either way, since we're not that concerned about the cache in the first place--just about the ways in which it might return the wrong data. (In case anyone's wondering: Arena's web application pre-caches large amounts of data when it starts, and web servers aren't put into rotation until they've finished this pre-caching step. The web servers are also bathed in RAM, so it's unlikely that fastpath caching is offering much of a performance boost over the Linux page cache--especially when that time is set against the overhead of database access.) - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Thu, 2008-08-21 at 17:34 -0400, Jim Davidson wrote: So, technically this was a case where we dynamically created code which was later read by ADP (which had the same dev/inode cache stuff as fastpath). However, this was done carefully: -- Tcl-level mutex/condition variables to ensure only one thread did the hard work even if several were interested in the result -- Careful write to a non .adp extension, unique temp file -- Atomic rename in place when ready It was a combination of traditional atomic Unix filesystem semantics and newer thread synchronization at the Tcl level used to avoid ever getting some mutant result. Here is an example using Tcl level commands (although a hidden use of mutex/condition vars: # save datastore to file # Note: a data.tmp file is created. If writing to this # succeeds, this is renamed to the data file, hopefully # atomically replacing it # Note2: the data.tmp file is not a lock file, it is used # to avoid a half written file in the event of power loss # or process exit. proc ::datastore::save { store } { set LockID [lock $store] if {[catch { set FD [open ${dataFileroot}.tmp w+] fconfigure $FD -translation binary -encoding binary puts $FD $out close $FD file rename -force ${dataFileroot}.tmp $dataFileroot } err ]} { unlock $store $LockID error $err ::datastore::save error saving store $store } unlock $store $LockID } But it is very difficult (impossible) to safely read/write files unless you can synchronize access (you need cooperation) and/or use atomic file operations (serialize access). The above example uses both. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Thu, 2008-08-21 at 13:52 -0700, John Caruso wrote: On Thursday 12:08 PM 8/21/2008, Titi Alailima wrote: It's perfectly acceptable in dealing with caching systems that the cached value could be out of sync, but not that the cached value could be for something entirely different from what you were looking for. Yep. I think that aspect of the issue has been getting lost--it's not just about getting stale data from a given file, but getting data from an entirely *different* file, which I'd agree violates any reasonable expectation of a caching system. John, Your patch fixes this issue as best it can be fixed. The issue that Titi is addressing cannot be fixed. With your patch you can be sure of the following: 1. All cache entries are unique. You can't create and cache two files with the same inode and mtime and have both be over 1 second old. 2. When an inode is reused (by the filesystem) the associated file mtime will be larger than the mtime of the cached entry. The new entry will replace the old entry. 3. If several files point to the same inode, updating this entry updates all of them (there is only one cache entry if the inode is used.) 4. If a file changes on disk, and it is less than 2 sec old, it is served directly, skipping the cache. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tue, 2008-08-19 at 18:25 -0700, John Caruso wrote: On Tuesday 05:59 PM 8/19/2008, Juan José del Río wrote: If you don't want to deactivate it, and have some C skills, I would recommend you to make the needed changes to fastpath code to enable it to use the kernel facilities of the operating system (in case you're using linux, then that'll be epoll system call; in FreeBSD case it's kqueue; etc.). This is an interesting suggestion, but from a quick scan of the epoll man page it doesn't look like it would work in this case since it acts on an open file descriptor, but fastpath associates file data with a (dev, inode, mtime, size) tuple without keeping an open file descriptor (and it'd be pretty wonky for AOLserver to keep open file descriptors for all files currently in the fastpath cache). No matter, though, we've got plenty of workarounds, and we'll probably just disable fastpath entirely since the benefits are likely vanishingly small anyway. Sorry John, i said epoll, but i meant inotify. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Could someone document ns_returnfp while we're talking about it? Titi Ala'ilima Lead Architect MedTouch LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 -Original Message- From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf Of Jim Davidson Sent: Tuesday, August 19, 2008 8:39 PM To: AOLSERVER@LISTSERV.AOL.COM Subject: Re: [AOLSERVER] Data corruption with fastpath caching Your right, the code snippet below could trip over a race condition as you've described. But, that's not reason enough to change the fastpath, it's reason to better document the behavior so folks don't write code that uses ns_returnfile for temporary, dynamic content. Although fastpath takes care to be correct in most cases (e.g., stat'ing the file on each request and serializing read on cache miss), the fast in fastpath is because it's primarily designed to return simple static content with minimal overhead. BTW: I believe the ns_returnfile command didn't use the fastpath originally -- I think it just opened and sent the content. It was changed because folks asked for it to go faster I think -- can't recall. Anyway, for your app, it might be easiest to not change your code but instead write a new ns_returnfile to override the builtin -- maybe just with open and ns_returnfp. -Jim On Aug 19, 2008, at 4:00 PM, John Caruso wrote: On Monday 05:53 PM 8/18/2008, Jeff Rogers wrote: russell muetzelfeldt wrote: fastpath is making assumptions about what means something is the same file, and those assumptions are not consistent with unix filesystem semantics - how is this not a bug? It's not a bug because no one ever said that it *was* strictly following unix filesystem semantics, which isn't even a single thing (ufs is slightly different than nfs, is slightly different than ext2 -noatime, is slightly different than afs, etc.) It is following a particular definition: if the file still exists and has the same dev/inode/mtime/size as it did when you last checked, then it is the same file. This of it as a if-modified-since or if-none- match conditional GET. Actually that's not analogous, for the same reason that the analogies to caching of attributes in NFS, rsync or tar not noticing content changes if attributes stay the same, etc, don't apply: because this bug can happen *even with two files that have completely different names or paths*. Again, in this example...: set file [open /var/tmp/myfile w] puts $file ABC123 close $file ns_returnfile 200 text/plain /var/tmp/myfile ns_unlink -nocomplain /var/tmp/myfile set file [open /var/tmp/myotherfile w] puts $file XYZ987 close $file ns_returnfile 200 text/plain /var/tmp/myotherfile ns_unlink -nocomplain /var/tmp/myotherfile ...AOLserver will almost always return the contents of /var/tmp/ myfile rather than /var/tmp/myotherfile in response to the second ns_returnfile. I think the analogies to other systems aren't really germane anyway-- AOLserver's behavior has to be judged on its own merits. But adopting that standard, I can't think of any other program that would confuse /var/tmp/myfile with /var/tmp/myotherfile. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Titi Alailima schrieb: This sounds like the problem. Not a bug with fastpath, Come on folks, the discussion wether or the behavior shown by John Caruso is a bug or not is completely fruitless. Most aolserver users are not novices, and if it takes some of us several weeks to find, what the problem is, we should act and not insist, that it is no bug. In my opinion, switching the caching index from inodes to file names (like in windows) is a very reasonable solution getting rid of most problems (although less cache efficient). It would be additionally a nice feature to provide a configuration option for getting back the current behavior (for people with tons of links). This option would guarantee backward compatibiity. This way, one could savely let fastpath switched on by default. -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Wednesday 08:45 AM 8/20/2008, Jim Davidson wrote: Overall, it seems one thing to do would be to switch to filename-based cache keys by default, leaving the dev/inode pair as an option for folks who run sites with large symlinks and want to benefit from caching objects just once. I think that should avoid the data corruption cases John pointed out with minimal downside. Actually that wouldn't have fixed the problem in the code that led us to find out about this in the first place. The change that I suggested does fix that problem, though, and it directly addresses the limitation of mtime's one-second granularity--which is the crux of the issue. The patch below (really just a one-line change) implements this fix: --- 8 - --- aolserver-4.5.0-orig/nsd/fastpath.c 2006-04-19 10:48:47.0 -0700 +++ aolserver-4.5.0/nsd/fastpath.c 2008-08-19 21:22:26.0 -0700 @@ -507,9 +507,11 @@ } if (servPtr-fastpath.cache == NULL - || stPtr-st_size servPtr-fastpath.cachemaxentry) { + || stPtr-st_size servPtr-fastpath.cachemaxentry + || (time(NULL) - stPtr-st_mtime) 2) { /* -* Caching is disabled or the entry is too large for the cache +* Caching is disabled, the entry is too large for the cache, +* or the file was modified too recently to be cached safely, * so just open, mmap, and send the content directly. */ --- 8 - We've tested this fix extensively against the code that was hitting the bug before, and I can verify that it resolves the problem there. As far as I can tell this fix would resolve the issue in any standard scenario (and certainly all of the ones I've outlined thus far). Given that this is a straightforward, user-transparent change that would have only a negligible impact on fastpath caching, and considering the security implications, I'd suggest that this change be applied to the AOLserver code in CVS. BTW, Jeff, the scenario you'd outlined that you thought would trip this up...: 13:50:21 - create file 13:50:21 - serve file (gets cached) 13:50:21 - delete file 13:50:21 - create file again (reuses inode) ... time passes ... 13:55:11 - serve file ...actually wouldn't, because the file would NOT be cached in the second line. The whole point of this strategy is that a file won't be cached if it's been modified within the threshold time (2 seconds in the patch above). - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Aug 20, 2008, at 11:54 AM, John Caruso wrote: Actually that wouldn't have fixed the problem in the code that led us to find out about this in the first place. The change that I suggested does fix that problem, though, and it directly addresses the limitation of mtime's one-second granularity--which is the crux of the issue. The patch below (really just a one-line change) implements this fix: all this traffic to use functionality in a way that wasn't intended. i fail to see how the right problem is being solved here. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John Caruso wrote: BTW, Jeff, the scenario you'd outlined that you thought would trip this up...: 13:50:21 - create file 13:50:21 - serve file (gets cached) 13:50:21 - delete file 13:50:21 - create file again (reuses inode) ... time passes ... 13:55:11 - serve file ...actually wouldn't, because the file would NOT be cached in the second line. The whole point of this strategy is that a file won't be cached if it's been modified within the threshold time (2 seconds in the patch above). Fine, then change that first timestamp to 13:50:18 (say if you ran another external program after creating the file but before serving it that took more than 2 seconds, or if your external program backdated the file mtime.) It's still a race condition that you'll hit if all the stars are in the wrong place. And it still hurts the optimization of using a 404 adp page to generate a heavyweight file only once that gets cached. If your patch solves your problem, that's great, and that's the whole point of OSS. But it does nothing to solve the problem generally and has negative side effects, so I think it would be a mistake to add it to the general distribution. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Hmm... I may be confused and have to re-read all the past messages (except I've deleted them) but my understanding was it was rapid re- use of inodes within the 1-second resolution of mtime and the same file size, all confusing the cache code that caused the problem. With the fix below with resolution for mtime at 1 second and the grace period at 2 seconds I can see how it would work but it would make me a bit queasy -- fixes which have assumptions of timing can be fragile.With filename-based keys and unique filenames (which would seem like a natural requirement from someone writing similar code), the inode would be ignored and you'd get a consistent view, regardless of timing or what other threads would be up to. I think you could try this approach as quickly as your fix -- just define _WIN32 after the include nsd.h to get the filename behavior of Win32. You could run your test and see if it was stable as well -- I'd be curious. Again, this whole issue is interesting and the problem report quite subtle, justifying some sort of defensive fix but using ns_returnfile for short, dynamic content still seems like the wrong approach. Ideas to use a cache of open fd's via Ns_GetTemp or Tcl channels via ns_returnfp seem closer to what's needed here. BTW: Which OS is re-using inodes so quickly? I can't get my Mac OS/X laptop to do that -- figured the inode re-use/prediction thing was plugged years ago, e.g., when fsirand was introduced for scrambling NFS vnodes. -Jim On Aug 20, 2008, at 12:54 PM, John Caruso wrote: On Wednesday 08:45 AM 8/20/2008, Jim Davidson wrote: Overall, it seems one thing to do would be to switch to filename- based cache keys by default, leaving the dev/inode pair as an option for folks who run sites with large symlinks and want to benefit from caching objects just once. I think that should avoid the data corruption cases John pointed out with minimal downside. Actually that wouldn't have fixed the problem in the code that led us to find out about this in the first place. The change that I suggested does fix that problem, though, and it directly addresses the limitation of mtime's one-second granularity--which is the crux of the issue. The patch below (really just a one-line change) implements this fix: --- 8 - --- aolserver-4.5.0-orig/nsd/fastpath.c 2006-04-19 10:48:47.0 -0700 +++ aolserver-4.5.0/nsd/fastpath.c 2008-08-19 21:22:26.0 -0700 @@ -507,9 +507,11 @@ } if (servPtr-fastpath.cache == NULL - || stPtr-st_size servPtr-fastpath.cachemaxentry) { + || stPtr-st_size servPtr-fastpath.cachemaxentry + || (time(NULL) - stPtr-st_mtime) 2) { /* -* Caching is disabled or the entry is too large for the cache +* Caching is disabled, the entry is too large for the cache, +* or the file was modified too recently to be cached safely, * so just open, mmap, and send the content directly. */ --- 8 - We've tested this fix extensively against the code that was hitting the bug before, and I can verify that it resolves the problem there. As far as I can tell this fix would resolve the issue in any standard scenario (and certainly all of the ones I've outlined thus far). Given that this is a straightforward, user-transparent change that would have only a negligible impact on fastpath caching, and considering the security implications, I'd suggest that this change be applied to the AOLserver code in CVS. BTW, Jeff, the scenario you'd outlined that you thought would trip this up...: 13:50:21 - create file 13:50:21 - serve file (gets cached) 13:50:21 - delete file 13:50:21 - create file again (reuses inode) ... time passes ... 13:55:11 - serve file ...actually wouldn't, because the file would NOT be cached in the second line. The whole point of this strategy is that a file won't be cached if it's been modified within the threshold time (2 seconds in the patch above). - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Jim Davidson wrote: BTW: Which OS is re-using inodes so quickly? I can't get my Mac OS/X laptop to do that -- figured the inode re-use/prediction thing was plugged years ago, e.g., when fsirand was introduced for scrambling NFS vnodes. Linux. This tcl page: set fn /tmp/tmpfile[expr rand()] set f [open $fn w] puts $f [ns_queryget data] close $f after 2 ns_returnfile 200 text/plain $fn ns_unlink $fn being hit at the same time in 2 windows: $ while true; do res=`curl -s http://localhost:8000/crap.tcl?data=wxyz`; if [ $res != 'wxyz' ]; then echo $res; break ; fi; echo -n . ; done $ while true; do res=`curl -s http://localhost:8000/crap.tcl?data=wxyz`; if [ $res != 'wxyz' ]; then echo $res; break ; fi; echo -n . ; done Will cause one or the other test script to get the error typically withing 5 requests. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Wednesday 10:58 AM 8/20/2008, Jim Davidson wrote: With the fix below with resolution for mtime at 1 second and the grace period at 2 seconds I can see how it would work but it would make me a bit queasy -- fixes which have assumptions of timing can be fragile. Perhaps, but not in this case. This is just another way to circumvent use of the cache on an object-by-object basis (like the size restriction that's already there), which recognizes the fact that using mtime as a determiner of uniqueness is limited by the fact that mtime has a granularity of only one-second. As for fragility, the fastpath algorithm is fragile *now*, thanks precisely to its assumptions about timing. This very simple change removes the bulk of that fragility. Again, this whole issue is interesting and the problem report quite subtle, justifying some sort of defensive fix but using ns_returnfile for short, dynamic content still seems like the wrong approach. Whether or not that's so, the fact is that everyone on this list appeared to share the same utterly natural assumption that ns_returnfile X really will return file X--which turns out to be untrue solely because of fastpath caching's design limitation. This fix resolves that design limitation in any standard circumstance. BTW: Which OS is re-using inodes so quickly? The ext3 filesystem on Linux and the ufs filesystem on Solaris both re-use inodes in this way. Jeff, On Wednesday 10:56 AM 8/20/2008, Jeff Rogers wrote: John Caruso wrote: BTW, Jeff, the scenario you'd outlined that you thought would trip this up...: 13:50:21 - create file 13:50:21 - serve file (gets cached) 13:50:21 - delete file 13:50:21 - create file again (reuses inode) ... time passes ... 13:55:11 - serve file ...actually wouldn't, because the file would NOT be cached in the second line. The whole point of this strategy is that a file won't be cached if it's been modified within the threshold time (2 seconds in the patch above). Fine, then change that first timestamp to 13:50:18 [...] No, you're still not understanding how the patch works. If you change the first timestamp to 13:50:18, the file will indeed be cached at 13:50:21--but with an mtime of 13:50:18. When the new file is served at 13:55:11, it will *not* result in a cache hit because the mtime will be different. That's exactly how the patch fixes this issue. If your patch solves your problem, that's great, and that's the whole point of OSS. But it does nothing to solve the problem generally and has negative side effects, so I think it would be a mistake to add it to the general distribution. I'm surprised you're taking such an all-or-nothing view now, given that you started out being open to discussion. This patch certainly does solve the problem generally--in all but what I'd say are pathological cases, and certainly in any standard usage (like the various examples I've posted). And it does it by directly addressing the fastpath algorithm's reliance on mtime, which has only one-second granularity. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
By the way, Jeff, regarding this...: On Wednesday 10:56 AM 8/20/2008, Jeff Rogers wrote: And it still hurts the optimization of using a 404 adp page to generate a heavyweight file only once that gets cached. ...which you'd explained elsewhere as...: There is also at least one clever optimization where static content does get served within a second of being created, where the 404 page is used to generate something like an image from something like a database and writes it to a file where it is subsequently served by fastpath. ...this fix doesn't break this functionality. You can still do it and it'll still work. And in fact, others have been arguing (and I believe you've been agreeing) that serving anything other than truly static content with ns_returnfile is immoral anyway--so it seems more than a bit contradictory to use that exact case to argue against the fix. In any case, though, assuming that this once-generated heavyweight file is actually reused multiple times, it *will* be cached...just not for 1-2 seconds. I think the negligible cost of not having caching for just-created files for a period of 1-2 seconds is more than justified by the necessity to patch such a serious data corruption and security hole. (I used two seconds out of an excess of caution, BTW; one would be sufficient.) - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Aug 20, 2008, at 1:29 PM, John Caruso wrote: Whether or not that's so, the fact is that everyone on this list appeared to share the same utterly natural assumption that ns_returnfile X really will return file X--which turns out to be untrue solely because of fastpath caching's design limitation. This fix resolves that design limitation in any standard circumstance. use ns_returnfile for static data as it was intended or you will have problems. how is this thread still alive. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John, The patch below is not at all unreasonable as far as your stated goal of not caching a newly modified file. BTW, I think that fastpath is not enabled by default, your config has to have the fastpath config section with the cache parameter set to true. Also, ns_returnfile is really just an internal C api which has been exposed to Tcl so that developers can create specific file handlers. In general it is dangerous to serve content from outside of pageroot. But that just means that the developer has to put a lot of care into avoiding problems. In other words, ns_returnfile is not really intended as an end user API. It usually ends up as some more friendly procedure that handles things like relative paths, security, etc. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 8/20/08 11:29 AM, John Caruso [EMAIL PROTECTED] wrote: Whether or not that's so, the fact is that everyone on this list appeared to share the same utterly natural assumption that ns_returnfile X really will return file X All, I've been on vacation or I would have chimed in earlier, but as John's client and CTO of the company who found the problem (and is now faced with a fairly extensive and difficult impact assessment to determine whether the confidentiality and integrity of our customers' data has been compromised), I find the suggestion that this is not a bug to be utterly baffling. Perhaps if the procedure in question was called ns_returnfromcache, I could see the arguments against the behavior being considered a bug, but the name of the procedure is ns_returnfile, and it takes an argument which is a filename. Our objective in using the procedure was not to return a dynamic file through the cache, it was to return a dynamically generated file (which was produced by an exec of an OS-level command) from the filesystem...and the documentation for the procedure certainly did not suggest that its functionality did not support this usage. Obviously we'll work around the problem in the future, but it is disheartening to find a fairly subtle bug, report it with a reproducible test case, and be challenged so aggressively on the whether it was a poor decision to use ns_returnfile to...um...return a file. Eric __ Eric Larkin Chief Technology Officer Arena Solutions [EMAIL PROTECTED] 4100 E. Third Ave.| Suite 300 | Foster City | CA 94404 tel: 650.513.3502 | fax: 650.513.3511 -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John Caruso wrote: No, you're still not understanding how the patch works. Ok, I'll admit that I misread it at first, but you're also not understanding why I'm saying why it will still break. I'm surprised you're taking such an all-or-nothing view now, I don't think I'm taking an all-or-nothing view at all, I just think your solution isn't the right one. given that you started out being open to discussion. This patch certainly does solve the problem generally--in all but what I'd say are pathological cases, and certainly in any standard usage (like the various examples I've posted). And it does it by directly addressing the fastpath algorithm's reliance on mtime, which has only one-second granularity. pathological cases are exactly the problem. I'm only speaking for myself of course, but I suspect that at least a few others would agree that your case is pathological itself, and not at all standard usage. I can very easily come up with a scenario that breaks your patched fastpath just as easily as the original, to which you can rightly say, but why would you do it that way?. And you would be right. That is the exact same thing that has been said repeatedly on this thread to you: why are you doing it that way? You probably have valid reasons and in any case I'm in no position to question your reasons. That doesn't make your case any less pathological than some other one. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Derek wrote: how is this thread still alive. I think this bikeshed should be painted blue. See http://www.bikeshed.com/ if you don't understand this. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Wed, 2008-08-20 at 12:23 -0700, Eric Larkin wrote: On 8/20/08 11:29 AM, John Caruso [EMAIL PROTECTED] wrote: Whether or not that's so, the fact is that everyone on this list appeared to share the same utterly natural assumption that ns_returnfile X really will return file X All, I've been on vacation or I would have chimed in earlier, but as John's client and CTO of the company who found the problem (and is now faced with a fairly extensive and difficult impact assessment to determine whether the confidentiality and integrity of our customers' data has been compromised), I find the suggestion that this is not a bug to be utterly baffling. Eric, I'm not sure what your qualifications are to determine if it is a bug or not. The author of the code doesn't seem to think it is a bug. Everyone agrees that the code works as intended. It was no secret at the time the code was written that the file mtime granularity is one second. When fastpath was added many years ago, it was documented in the changelogs. There are configuration parameters in the config file. I just sent an email responding to John's suggested patch. It is a great suggestion for several reasons, the most important is that it doesn't change the intended purpose of the cache or the API. As John said there is no visible impact on the user. I would even go so far as to suggest that the wait time (2 sec) be added as a configuration parameter. Although the semantics should be discussed. This patch may fix your initial problem, but it does nothing to fix the broken use of ns_returnfile. If you are serious about not exposing sensitive information, don't write it to disk as a file. Most security breaches don't happen by accident. I have outlined how you can avoid the problem using ns_returnfp, _AND_ a particular series of commands. No single API will serve as some kind of shield of protection, it takes a lot of effort. Anything involving files opens up a whole series of problems. They are not bugs. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Wednesday 12:30 PM 8/20/2008, Jeff Rogers wrote: I can very easily come up with a scenario that breaks your patched fastpath just as easily as the original, to which you can rightly say, but why would you do it that way?. And you would be right. Do it, then. This is the simplest example I've given that exhibits this bug: eval exec /some/external/program --output-file $tempfile ns_returnfile 200 text/plain $tempfile ns_unlink $tempfile This is ALL that's required. No external meddling, no munging of file modification times, nothing else. Three lines of Tcl code. By all means, show me an example that defeats this patch that's anywhere near as simple as that. And far, far more to the point: that's as NATURAL as that. I'd assert that that example code is perfectly intuitive--and right up to the point where I pointed out this bug in the first place, it would have been accepted without remark on this mailing list. *Now*, of course, serving anything but 5-year old files with ns_returnfile is proof that one should give up computers for a living, and not recognizing the holy contract we enter into with AOLserver when we invoke ns_returnfile that our files must continue to exist for at least one more second is a valid reason to contemplate suicide to end our worthless existence. Look: we discovered a serious bug in AOLserver's fastpath caching mechanism that can cause both data corruption and information leakage. I've explained that bug carefully, in the face of confusion, obfuscation, and a continual stream of utterly unnecessary abuse. I've offered a tested patch that provides a minimal, correct-by-inspection, user-transparent, massive improvement over the current behavior, despite my near certainty (based on previous experience) that it'd just lead to another round of carping, finger-pointing, and refusal to accept that there's even a problem here in the first place. I've explained why the patch makes sense (and in fact addresses exactly the limitation that should have been considered when the fastpath caching mechanism was initially designed). I've responded to all serious concerns, respectfully and without returning any of the bile that's been sent my way. For anyone who's serious about securing your installation, you have the patch now, and I'd strongly suggest that you apply it to your AOLserver sources. I'll still respond to serious concerns that don't just rehash the same excuses, but otherwise I'm done. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Hi Folks, I agree with Eric, even though I wrote the original code and was one of the first to suggest is wasn't a bug. This thread has surprised me in a few ways: -- The bug was indeed subtle and curious -- The debate on dynamic vs. static and underlying assumptions and performance was well reasoned -- The name of the command does indicate something it is not in a way that matters -- Folks generally underestimated the impact of this bug except for those affected -- The direct personal attacks were a bit embarrassing to watch Overall, seems like: -- Patching underlying fastpath to be filename-based keys makes sense if it's confirmed to solve the problem -- What ns_returnfile does is good for some things if you know what it's doing -- Sending dynamic content with the current ns_returnfile isn't a good idea For Eric and John, I'd recommend a Tcl-based ns_retunfile wrapper using open/ns_returnfp/close as a quick first step. BTW: The underlying cause -- the rapid re-use of inodes -- is indeed a behavior of at least Linux but not Mac OS/X. Compare: Linux: [EMAIL PROTECTED]:~$ rm -f foo ; touch foo ; stat -c %d.%i foo ; rm foo ; touch foo ; stat -c %d.%i foo 2049.712963 2049.712963 Mac OS/X: [JimBook:~] jimbo% rm -f foo ; touch foo ; stat -f %d.%i foo ; rm foo ; touch foo ; stat -f %d.%i foo 234881026.4090565 234881026.4090566 I find this very interesting But, maybe I'm odd, I know most of you have probably long since begun to ignore this thread and it should probably die now... Cheers, -Jim On Aug 20, 2008, at 3:23 PM, Eric Larkin wrote: On 8/20/08 11:29 AM, John Caruso [EMAIL PROTECTED] wrote: Whether or not that's so, the fact is that everyone on this list appeared to share the same utterly natural assumption that ns_returnfile X really will return file X All, I've been on vacation or I would have chimed in earlier, but as John's client and CTO of the company who found the problem (and is now faced with a fairly extensive and difficult impact assessment to determine whether the confidentiality and integrity of our customers' data has been compromised), I find the suggestion that this is not a bug to be utterly baffling. Perhaps if the procedure in question was called ns_returnfromcache, I could see the arguments against the behavior being considered a bug, but the name of the procedure is ns_returnfile, and it takes an argument which is a filename. Our objective in using the procedure was not to return a dynamic file through the cache, it was to return a dynamically generated file (which was produced by an exec of an OS-level command) from the filesystem...and the documentation for the procedure certainly did not suggest that its functionality did not support this usage. Obviously we'll work around the problem in the future, but it is disheartening to find a fairly subtle bug, report it with a reproducible test case, and be challenged so aggressively on the whether it was a poor decision to use ns_returnfile to...um...return a file. Eric __ Eric Larkin Chief Technology Officer Arena Solutions [EMAIL PROTECTED] 4100 E. Third Ave.| Suite 300 | Foster City | CA 94404 tel: 650.513.3502 | fax: 650.513.3511 -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Hello John, I just think that you're introducing complexity to a common and widely used infraestructure (with a hack), just to hide the flaws on your side. I don't like this hack. It's not the correct thing to do, not even if you make some parameters (like the timeout) changeable via the configuration file. It just doesn't feel good to me. But, of course, you are free to patch your own AOLServer. I can't (and won't) complain about it. Someone said that this was not an academic discussion... but I am afraid that to some extent it is. We've been speaking that the cache can't know if its contents are good or not, if the changes happen out if its scope, and it's never notified of it. But, since Linux version 2.6.13, inotify is into the kernel, and aolserver can subscribe to a path, so know if that file has been deleted, modified, or anything else. That's the way a cache can know if the file has been altered in any way, and it should be marked as dirty. As easy as that... but I know it is harder than that one-line-patch. Now you can do whatever you want... as long as you don't ask me to patch my code with the dirty hack you proposed. I don't like having AOLServer's SVN version patched with this. Best Regards, Juan José - Juan José del Río| (+34) 616 512 340| [EMAIL PROTECTED] Simple Option S.L. Tel: (+34) 951 930 122 Fax: (+34) 951 930 122 http://www.simpleoption.com On Wed, 2008-08-20 at 13:21 -0700, John Caruso wrote: On Wednesday 12:30 PM 8/20/2008, Jeff Rogers wrote: I can very easily come up with a scenario that breaks your patched fastpath just as easily as the original, to which you can rightly say, but why would you do it that way?. And you would be right. Do it, then. This is the simplest example I've given that exhibits this bug: eval exec /some/external/program --output-file $tempfile ns_returnfile 200 text/plain $tempfile ns_unlink $tempfile This is ALL that's required. No external meddling, no munging of file modification times, nothing else. Three lines of Tcl code. By all means, show me an example that defeats this patch that's anywhere near as simple as that. And far, far more to the point: that's as NATURAL as that. I'd assert that that example code is perfectly intuitive--and right up to the point where I pointed out this bug in the first place, it would have been accepted without remark on this mailing list. *Now*, of course, serving anything but 5-year old files with ns_returnfile is proof that one should give up computers for a living, and not recognizing the holy contract we enter into with AOLserver when we invoke ns_returnfile that our files must continue to exist for at least one more second is a valid reason to contemplate suicide to end our worthless existence. Look: we discovered a serious bug in AOLserver's fastpath caching mechanism that can cause both data corruption and information leakage. I've explained that bug carefully, in the face of confusion, obfuscation, and a continual stream of utterly unnecessary abuse. I've offered a tested patch that provides a minimal, correct-by-inspection, user-transparent, massive improvement over the current behavior, despite my near certainty (based on previous experience) that it'd just lead to another round of carping, finger-pointing, and refusal to accept that there's even a problem here in the first place. I've explained why the patch makes sense (and in fact addresses exactly the limitation that should have been considered when the fastpath caching mechanism was initially designed). I've responded to all serious concerns, respectfully and without returning any of the bile that's been sent my way. For anyone who's serious about securing your installation, you have the patch now, and I'd strongly suggest that you apply it to your AOLserver sources. I'll still respond to serious concerns that don't just rehash the same excuses, but otherwise I'm done. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Wednesday 01:45 PM 8/20/2008, Juan José del Río wrote: But, since Linux version 2.6.13, inotify is into the kernel, and aolserver can subscribe to a path, so know if that file has been deleted, modified, or anything else. That's the way a cache can know if the file has been altered in any way, and it should be marked as dirty. As easy as that... but I know it is harder than that one-line-patch. inotify isn't available as of Redhat Enterprise Linux 4, and I'm sure there are other major distributions that are missing it as well--so while it may be the best approach eventually it won't solve the problem right now. Also, not all platforms use inotify, so adding filesystem monitoring support would require complex cross-platform code (which in some cases won't even be available). So while I'd agree with you that the eventual (ideal) solution is a major rewrite of the fastpath caching code to use some sort of filesystem monitoring technique, that won't work now. By comparison, the fix I offered will work just fine on all the platforms that AOLserver runs on, right now, and it fixes all non-pathological use cases (i.e., all but ones that are artifically designed to trip it up). As to whether it's a hack, possibly, but so is the fastpath code--that's the whole problem. The patch corrects a specific flaw in the original code: that fastpath never should have been caching files that were modified within the last second, since the underlying OS mechanism (mtime) doesn't provide the resolution necessary to distinguish reliably between two such files. In other words, the patch is in the same spirit as the original code--so if you really don't want such ugliness in your code you should probably just strip out fastpath caching entirely. :-) - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Wed, Aug 20, 2008 at 12:22 PM, Jeff Rogers [EMAIL PROTECTED] wrote: Linux. This tcl page: set fn /tmp/tmpfile[expr rand()] set f [open $fn w] puts $f [ns_queryget data] close $f after 2 ns_returnfile 200 text/plain $fn ns_unlink $fn [after 2] would wait 2 milliseconds. [after 2000] or [ns_sleep 2] might make a difference(?). -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Hello again John, Oops, sorry. I thought RedHat Linux had everything an Enterprise needed ;-) (Take the above line with a lot of salt. I don't want to discuss about linux distros... yet ;-) Then what about adding a -nocache parameter to the function? That way you won't modify the original behaviour... As far as the solution is not perfect, if we keep pushing up patches, it'll end being a hell consisting on a pile of patches. In fact, I feel that checking the filename instead of last modification time is a better way to patch the if clause. It's safer to be sure that will not be collisions by that way. Anyways, as someone said, I am amazed on the fact of how ext3 reuses the inode numbers... I didn't know it was so aggresive :-) Regards, and hope you finally can serve your customers right (no matter how you solve this problem), Juan José - Juan José del Río| (+34) 616 512 340| [EMAIL PROTECTED] Simple Option S.L. Tel: (+34) 951 930 122 Fax: (+34) 951 930 122 http://www.simpleoption.com On Wed, 2008-08-20 at 14:24 -0700, John Caruso wrote: On Wednesday 01:45 PM 8/20/2008, Juan José del Río wrote: But, since Linux version 2.6.13, inotify is into the kernel, and aolserver can subscribe to a path, so know if that file has been deleted, modified, or anything else. That's the way a cache can know if the file has been altered in any way, and it should be marked as dirty. As easy as that... but I know it is harder than that one-line-patch. inotify isn't available as of Redhat Enterprise Linux 4, and I'm sure there are other major distributions that are missing it as well--so while it may be the best approach eventually it won't solve the problem right now. Also, not all platforms use inotify, so adding filesystem monitoring support would require complex cross-platform code (which in some cases won't even be available). So while I'd agree with you that the eventual (ideal) solution is a major rewrite of the fastpath caching code to use some sort of filesystem monitoring technique, that won't work now. By comparison, the fix I offered will work just fine on all the platforms that AOLserver runs on, right now, and it fixes all non-pathological use cases (i.e., all but ones that are artifically designed to trip it up). As to whether it's a hack, possibly, but so is the fastpath code--that's the whole problem. The patch corrects a specific flaw in the original code: that fastpath never should have been caching files that were modified within the last second, since the underlying OS mechanism (mtime) doesn't provide the resolution necessary to distinguish reliably between two such files. In other words, the patch is in the same spirit as the original code--so if you really don't want such ugliness in your code you should probably just strip out fastpath caching entirely. :-) - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John, Your last patch suggestion seems good, not caching something that looks like a new file is fully in line with the intent of fastpath and ns_returnfile. I'm not sure everyone is commenting on this new patch idea, maybe a previous idea? Anyway if the cache is for serving static (and likely older than a few seconds) content, why cache it until it looks static? I think that is the basic thrust of the patch. It would be impossible to poison the cache by accident if you wait until a file ages a few seconds before you stick it in. But the poison is really self-inflicted. The only data on the server that can be unintentionally exposed is something that made it into the cache in the first place. Stuff only makes its way into the cache by filename. So if your application sends out secret files via ns_returnfile, you could have a problem, but no long lived secret file will ever fall into this trap by accident, it would have to cease to exist (giving up its inode) after being place into cache, then a new file would have to be created and served. All of this assumes that your webserver process can read the secret file. BUT! Please pay attention to this: ns_returnfile is by definition not safe in the context of most webserver APIs. It returns content that is outside of pageroot. It exposes (to Tcl) an internal API which handles returning files under pageroot. The reason it is exposed is so that developers can easily create their own filehandlers and virtual servers. The internal API handles a handful of annoyingly picky but important and standard website and HTTP features. It also provides a number of hooks for customization related to these features. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 21/08/2008, at 1:45 AM, Jim Davidson wrote: I looked at the code a bit closer. The ns_returnfile and ns_respond commands both call Ns_ConnReturnFile, the public API to the underlying FastPath. It does more than just blast the content -- it handles: ... -- caches, mmap's, or simply opens the fd and sends, chunk by chunk not directly related to the issue at hand, but if this is being worked on shouldn't the file returning be handled by sendfile() on platforms that support it? that'd bypass fastpath, but on linux at least you'd not be wasting RAM by buffering in both the page cache and fastpath, and spitting the data back out the connection socket gets handled entirely in the kernel... just a thought... cheers Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Mon, 2008-08-18 at 19:20 -0700, John Caruso wrote: I'd say it's still better, because it requires explicit action on the user's part to enable the flawed caching mechanism in that case. And actually I don't think fastpath in its default configuration would be of much help in performance terms these days, given that the cache is only 5MB large and file data is typically cached by the OS anyway (and servers generally have far more RAM than they did even five years ago). fastpath is for small static content. You don't need to cache large files, and that is why the cachemaxsize parameter gives you a cutoff on the largest size to cache. AOLserver has great performance on small files, fastpath speeds it up further, plus the overall scheme handles directory files, internal redirects, etc. I do think this should have been considered (and steps taken to address it) when the fastpath caching mechanism was initially developed, since it's a glaring flaw. I've designed things that rely on shaky underlying assumptions in the past, but only in controlled circumstances where those assumptions were guaranteed to obtain. I can think of situations in which a caching mechanism with this type of design limitation wouldn't be an issue, but in my opinion it has no place being a default-enabled mechanism in an enterprise-grade web server. Why not just write another API which strips out all the things you don't like. I think you misjudge fastpath in every way, but whatever. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote: That'd be an improvement over the current situation, but it's still the case that AOLserver as currently shipped has a file cache mechanism built into it which 1) may return incorrect data and 2) is enabled by default. Given the risk, I'd say fastpath caching should be disabled by default rather than enabled. Sounds right to me. Either robustify Fastpath somehow against this corner case, or don't have Fastpath turned on by default. -- Andrew Piskorski [EMAIL PROTECTED] http://www.piskorski.com/ -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
This would be a wonderful addition to the documentation. As a matter of fact, I just added it: http://panoptic.com/wiki/aolserver/Fastpath For what it's worth, it seems to me that if it has a measurable benefit, it's worth leaving on by default, as long as developers are properly educated about design issues (flaws, bugs, tradeoffs, whatever) that they need to deal with. If it's off by default it may as well be removed entirely. I say on by default, but well-documented so that developers are forced to have at least a cursory understanding of it when doing anything that might relate to it. Titi Ala'ilima Lead Architect MedTouch LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 -Original Message- From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf Of Tom Jackson Sent: Tuesday, August 19, 2008 1:18 AM To: AOLSERVER@LISTSERV.AOL.COM Subject: Re: [AOLSERVER] Data corruption with fastpath caching On Tue, 2008-08-19 at 12:24 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 11:59 AM, Tom Jackson wrote: On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 10:56 AM, Tom Jackson wrote: You want a transactional database but you are using a filesystem. Grow up. and If your application wasn't the responsible party which violated the expectation you state, I would agree (maybe). please go and re-read this thread, and get your parties straight. Sorry, I don't follow. ok, I'll spell it out. it's not my application that's violated the expectation I state. you haven't been paying attention to the From: headers, and seem to have mistaken me for the original poster of this thread. Ah, okay. I didn't mean to point to any particular application, by your I didn't mean any particular you or your. all I've been saying is that ns_returnfile filename returning the content of something other than filename, contrary to the documentation and common sense, is a bug. given that fastpath exists for a (good) reason, and that the behaviour which triggers the bug is marginal anyway, the correct response is the bug will not be fixed, here's why, and here's how to work around it. It is an interesting point. But it isn't a bug. The purpose of the API is to return a static file, not one which changes in under a second. It is not a bug to not support code which is guaranteed to be slower than common alternatives. Fastpath is designed to support return of smallish static content. It isn't some ancient way of speeding up stuff that was slow, it was for speeding up stuff that was already fast but was easy to make even faster. If you want to avoid use of fastpath, just set the configuration lower than your dynamic content: # # Fastpath # ns_section ns/server/${server}/fastpath ns_param cache[set cache 10] ;# max entries ?? ns_param cachemaxsize [set cachemaxsize [expr 5 * 1024 * 1024]] ns_param cachemaxentry[expr round(floor($cachemaxsize/$cache))] Or, if the dynamic content is very small, or customized, don't write it to a file in the first place. In general you are probably doing something wrong if you write small content to a file and immediately delete it. You are also likely doing something wrong if you are caching large files. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
I agree with Titi. The vast majority of times, having Fastpath on does not harm at all. Having it disabled by default would be like not using computers because they fail sometimes. That would be too extreme, isn't it? ;-) As long as it's well documented, and there are alternatives to avoid the problems, i think it's ok to leave Fastpath activated by default. Regards, Juan José -- Juan José del Río Chief of Commerce Simple Option S.L. Avda. Editor Angel Caffarena 11, B11, 1B Málaga, 29010, Spain +34 616 512 340 cell +34 951 930 122 tel/fax On Tue, 2008-08-19 at 06:18 -0700, Titi Alailima wrote: This would be a wonderful addition to the documentation. As a matter of fact, I just added it: http://panoptic.com/wiki/aolserver/Fastpath For what it's worth, it seems to me that if it has a measurable benefit, it's worth leaving on by default, as long as developers are properly educated about design issues (flaws, bugs, tradeoffs, whatever) that they need to deal with. If it's off by default it may as well be removed entirely. I say on by default, but well-documented so that developers are forced to have at least a cursory understanding of it when doing anything that might relate to it. Titi Ala'ilima Lead Architect MedTouch LLC 1100 Massachusetts Avenue Cambridge, MA 02138 617.621.8670 x309 -Original Message- From: AOLserver Discussion [mailto:[EMAIL PROTECTED] On Behalf Of Tom Jackson Sent: Tuesday, August 19, 2008 1:18 AM To: AOLSERVER@LISTSERV.AOL.COM Subject: Re: [AOLSERVER] Data corruption with fastpath caching On Tue, 2008-08-19 at 12:24 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 11:59 AM, Tom Jackson wrote: On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 10:56 AM, Tom Jackson wrote: You want a transactional database but you are using a filesystem. Grow up. and If your application wasn't the responsible party which violated the expectation you state, I would agree (maybe). please go and re-read this thread, and get your parties straight. Sorry, I don't follow. ok, I'll spell it out. it's not my application that's violated the expectation I state. you haven't been paying attention to the From: headers, and seem to have mistaken me for the original poster of this thread. Ah, okay. I didn't mean to point to any particular application, by your I didn't mean any particular you or your. all I've been saying is that ns_returnfile filename returning the content of something other than filename, contrary to the documentation and common sense, is a bug. given that fastpath exists for a (good) reason, and that the behaviour which triggers the bug is marginal anyway, the correct response is the bug will not be fixed, here's why, and here's how to work around it. It is an interesting point. But it isn't a bug. The purpose of the API is to return a static file, not one which changes in under a second. It is not a bug to not support code which is guaranteed to be slower than common alternatives. Fastpath is designed to support return of smallish static content. It isn't some ancient way of speeding up stuff that was slow, it was for speeding up stuff that was already fast but was easy to make even faster. If you want to avoid use of fastpath, just set the configuration lower than your dynamic content: # # Fastpath # ns_section ns/server/${server}/fastpath ns_param cache[set cache 10] ;# max entries ?? ns_param cachemaxsize [set cachemaxsize [expr 5 * 1024 * 1024]] ns_param cachemaxentry[expr round(floor($cachemaxsize/$cache))] Or, if the dynamic content is very small, or customized, don't write it to a file in the first place. In general you are probably doing something wrong if you write small content to a file and immediately delete it. You are also likely doing something wrong if you are caching large files. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Andrew, This is not a corner case. The exact same thing could happen without fastpath. What is that thing? That the contents of a file changes after a request is made and before the file is returned. In fact, there is no guarantee that it won't change mid-return. This is a fact of life with files on any filesystem. In fact, with the HTTP caching mechanisms, you could fail to get up-to-date contents of a file, since the If-Modified-Since mechanism will also fail. The problem here is that the application is using this static file handling API to serve dynamic content. Wondering why it doesn't work is pointless. Just to summarize again, this case requires that a file is created then destroyed and another file created within the same second that has the same size. Also, the original file must get into the cache, and the only way that can happen is for the application to treat it as a long lived static file. We have other means to cache dynamic data, and large chunks of dynamic content saved as a file can avoid the fastpath cache by setting the cachemaxsize parameter. Writing smaller content to disk doesn't make any sense if your goal is speed...or security. It is probably even more important to tamp down these misconceptions about how AOLserver works. Static and dynamic content are handled by different API. The reason is that it has long been recognized by the developers of AOLserver that different techniques are required to maintain high performance based upon how the content is generated, its expected lifespan, its size, and its potential for reuse. tom jackson On Tue, 2008-08-19 at 03:00 -0400, Andrew Piskorski wrote: On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote: That'd be an improvement over the current situation, but it's still the case that AOLserver as currently shipped has a file cache mechanism built into it which 1) may return incorrect data and 2) is enabled by default. Given the risk, I'd say fastpath caching should be disabled by default rather than enabled. Sounds right to me. Either robustify Fastpath somehow against this corner case, or don't have Fastpath turned on by default. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
What about using epoll (or equivalent) in Linux, and kqueue in FreeBSD to tell the kernel to notify AOLServer in change a file has changed? That'd be a pretty easy and efficient way to discard fastpath items in case they have been deleted and/or modified. Just my two cents ;-) - Juan José del Río| (+34) 616 512 340| [EMAIL PROTECTED] Simple Option S.L. Tel: (+34) 951 930 122 Fax: (+34) 951 930 122 http://www.simpleoption.com On Tue, 2008-08-19 at 09:20 -0700, Tom Jackson wrote: Andrew, This is not a corner case. The exact same thing could happen without fastpath. What is that thing? That the contents of a file changes after a request is made and before the file is returned. In fact, there is no guarantee that it won't change mid-return. This is a fact of life with files on any filesystem. In fact, with the HTTP caching mechanisms, you could fail to get up-to-date contents of a file, since the If-Modified-Since mechanism will also fail. The problem here is that the application is using this static file handling API to serve dynamic content. Wondering why it doesn't work is pointless. Just to summarize again, this case requires that a file is created then destroyed and another file created within the same second that has the same size. Also, the original file must get into the cache, and the only way that can happen is for the application to treat it as a long lived static file. We have other means to cache dynamic data, and large chunks of dynamic content saved as a file can avoid the fastpath cache by setting the cachemaxsize parameter. Writing smaller content to disk doesn't make any sense if your goal is speed...or security. It is probably even more important to tamp down these misconceptions about how AOLserver works. Static and dynamic content are handled by different API. The reason is that it has long been recognized by the developers of AOLserver that different techniques are required to maintain high performance based upon how the content is generated, its expected lifespan, its size, and its potential for reuse. tom jackson On Tue, 2008-08-19 at 03:00 -0400, Andrew Piskorski wrote: On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote: That'd be an improvement over the current situation, but it's still the case that AOLserver as currently shipped has a file cache mechanism built into it which 1) may return incorrect data and 2) is enabled by default. Given the risk, I'd say fastpath caching should be disabled by default rather than enabled. Sounds right to me. Either robustify Fastpath somehow against this corner case, or don't have Fastpath turned on by default. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Tom Jackson wrote: If you want to avoid use of fastpath, just set the configuration lower than your dynamic content: # # Fastpath # ns_section ns/server/${server}/fastpath ns_param cache[set cache 10] ;# max entries ?? ns_param cachemaxsize [set cachemaxsize [expr 5 * 1024 * 1024]] ns_param cachemaxentry[expr round(floor($cachemaxsize/$cache))] The description of the parameters here is a little confusing. Browsing the source, it appears that cache is a flag to enable or disable fastpath, cachemaxsize is the maximum size of the cache, and cachemaxentry is the largest size of a file that will get cached. There is no setting for the max number of entries, the use of $cache in the settings above (reflecting the server defaults) is really a minimum number of cache entries (i.e., the default cache will hold at least 10 entries of the max 512k size, but it could also hold 1000 5k files). I didn't dig deep enough to see how the cache flushing works, but on casual perusal it looks like the cache is pruned by removing the oldest entries (not largest, least hit, or least recently hit). -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Hi folks, I wrote the code. The explanation below is correct -- I chose inode/ dev combination to cache the same file even with multiple names which was the case at AOL -- hundreds of symlinks and hard links to the same file. The same strategy is used for ADP templates. I think the code uses just filenames on Windows because the inode/dev don't really exist in Win32 weirdness (or at least I didn't care enough to find the proper analog). As for whether this is a bug or not, opinions vary. I would suggest the code snippet of create temp file and use fastpath to return contents is not a use case I was solving for or recommend. The suggestion to open a temp fd, unlink it, dump content into and send from the open fd seems the better approach for a few reason including proper cleanup after a crash. In fact, there is an API for such cases -- Ns_GetTemp or something -- and it's used internally, for example, to spool large file uploads. It re-uses open and unlinked fd's -- in practice file create is expensive and is avoided by just keeping a cache of open fd's around, truncating the content at the end of the connection. I'm not sure if the docs are up to date and/or if there are useful Tcl commands but something could be added easily if needed. Having said all that, a note in the docs that ns_returnfile is designed for truly static content... and comments on how the cache works and can be disabled would make sense. -Jim On Aug 18, 2008, at 7:37 PM, Tom Jackson wrote: On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote: While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. I don't think it is a bug in fastpath. Think about the case where multiple logical files are actually the same physical file. Using the name would result in caching the same object under different names. This is a much more likely situation than this so called bug. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tuesday 10:40 AM 8/19/2008, Jim Davidson wrote: I would suggest the code snippet of create temp file and use fastpath to return contents is not a use case I was solving for or recommend. It's also not the use case in question--just a simple illustration of the problem. Here's a more realistic template of a use case (which closely mirrors the actual code that led to the discovery of the bug): eval exec /some/external/program --output-file $tempfile ns_returnfile 200 text/plain $tempfile ns_unlink -nocomplain $tempfile In other words, run an external program that writes its output to $tempfile, return that file to the user, and delete the file. This is a case in which ns_returnfile seems like the obvious and appropriate call--but if this procedure is run on behalf of users A and B within the same second (which is common on an active web server), and the results in $tempfile are the same length, B will get A's output. Depending on what information the external program writes to $tempfile, this could easily represent a security breach. That example involves timing between two different users, but something like the following will also trigger the bug: foreach user $users { eval exec /some/external/program --output-file $tempfile --user $user ns_returnfile 200 text/plain $tempfile } Again, this code looks perfectly appropriate, but it's very likely to return incorrect data due to this bug. Note that the ns_unlink isn't even required in this case. Also, regarding use fastpath to return content: the developer in this case didn't know fastpath from a hole in the ground--after all, they were calling ns_returnfile, not fastpath. fastpath is just the behind-the-scenes mechanism that was making ns_returnfile X return a file other than X. And generally speaking, I'd say it's perfectly reasonable for a developer to believe that ns_returnfile X actually will return file X. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John, It is not a bug in ns_returnfile. tom jackson On Tue, 2008-08-19 at 11:52 -0700, John Caruso wrote: On Tuesday 10:40 AM 8/19/2008, Jim Davidson wrote: I would suggest the code snippet of create temp file and use fastpath to return contents is not a use case I was solving for or recommend. It's also not the use case in question--just a simple illustration of the problem. Here's a more realistic template of a use case (which closely mirrors the actual code that led to the discovery of the bug): eval exec /some/external/program --output-file $tempfile ns_returnfile 200 text/plain $tempfile ns_unlink -nocomplain $tempfile In other words, run an external program that writes its output to $tempfile, return that file to the user, and delete the file. This is a case in which ns_returnfile seems like the obvious and appropriate call--but if this procedure is run on behalf of users A and B within the same second (which is common on an active web server), and the results in $tempfile are the same length, B will get A's output. Depending on what information the external program writes to $tempfile, this could easily represent a security breach. That example involves timing between two different users, but something like the following will also trigger the bug: foreach user $users { eval exec /some/external/program --output-file $tempfile --user $user ns_returnfile 200 text/plain $tempfile } Again, this code looks perfectly appropriate, but it's very likely to return incorrect data due to this bug. Note that the ns_unlink isn't even required in this case. Also, regarding use fastpath to return content: the developer in this case didn't know fastpath from a hole in the ground--after all, they were calling ns_returnfile, not fastpath. fastpath is just the behind-the-scenes mechanism that was making ns_returnfile X return a file other than X. And generally speaking, I'd say it's perfectly reasonable for a developer to believe that ns_returnfile X actually will return file X. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Monday 05:53 PM 8/18/2008, Jeff Rogers wrote: russell muetzelfeldt wrote: fastpath is making assumptions about what means something is the same file, and those assumptions are not consistent with unix filesystem semantics - how is this not a bug? It's not a bug because no one ever said that it *was* strictly following unix filesystem semantics, which isn't even a single thing (ufs is slightly different than nfs, is slightly different than ext2 -noatime, is slightly different than afs, etc.) It is following a particular definition: if the file still exists and has the same dev/inode/mtime/size as it did when you last checked, then it is the same file. This of it as a if-modified-since or if-none-match conditional GET. Actually that's not analogous, for the same reason that the analogies to caching of attributes in NFS, rsync or tar not noticing content changes if attributes stay the same, etc, don't apply: because this bug can happen *even with two files that have completely different names or paths*. Again, in this example...: set file [open /var/tmp/myfile w] puts $file ABC123 close $file ns_returnfile 200 text/plain /var/tmp/myfile ns_unlink -nocomplain /var/tmp/myfile set file [open /var/tmp/myotherfile w] puts $file XYZ987 close $file ns_returnfile 200 text/plain /var/tmp/myotherfile ns_unlink -nocomplain /var/tmp/myotherfile ...AOLserver will almost always return the contents of /var/tmp/myfile rather than /var/tmp/myotherfile in response to the second ns_returnfile. I think the analogies to other systems aren't really germane anyway--AOLserver's behavior has to be judged on its own merits. But adopting that standard, I can't think of any other program that would confuse /var/tmp/myfile with /var/tmp/myotherfile. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John Caruso wrote: Think of it as a if-modified-since or if-none-match conditional GET. Actually that's not analogous, ... I didn't mean to say it was exactly the same, just similar in that given a particular system that makes particular assumptions it is possible to construct a situation where the results are unexpected or incorrect in a particular way. I think by now everyone reading this understands the problem. What's not clear is what you are expecting to happen now. Documentation has been updated to reflect awareness of this problem and caution against using ns_returnfile in this situation and suggesting alternate solutions in the client code. Some code fixes have been proposed, which for various reasons are undesirable or simply won't fix the problem. A default configuration change was suggested which seems generally viewed as undesirable. What more are you looking for? -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tuesday 02:10 PM 8/19/2008, Jeff Rogers wrote: A default configuration change was suggested which seems generally viewed as undesirable. My impression was that support was split about evenly, actually. I take it that means you're against changing the default? I'm a bit surprised, since you started out agreeing that it's a bug. Personally I can't imagine any persuasive argument that a caching mechanism that can easily confuse /usr/local/private/var/rootpass and /var/tmp/verisign/certs/webcert.txt should be enabled by default in a web server. For anyone thinking, well, you're the only one who's ever seen this bug, I'd say no, we're just the first ones to discover this bug. It's quite possible that other people have run into it without knowing it, since AOLserver will just silently serve the wrong data. As for what I want, as I said, I was mainly bringing this up to shine a light on the issue and see what other people's thoughts were. That's been helpful in particular because I hadn't considered the security implications, which are quite serious; I may raise this issue on security forums as well so that people using ns_returnfile are aware of the danger of silent data corruption and/or information leaks and can review their code accordingly. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John, This isn't a democracy. You have to demonstrate some understanding of how things work. The only real security issue is your misuse/abuse of ns_returnfile to serve dynamic data. Nobody is going to guarantee that you can't shoot yourself in the foot due to your lack of understanding of writing robust code, or how to configure and maintain a secure internet application, or take advice on how to do so. But please, go tell the security police about our insecure file commands. tom jackson On Tue, 2008-08-19 at 15:33 -0700, John Caruso wrote: On Tuesday 02:10 PM 8/19/2008, Jeff Rogers wrote: A default configuration change was suggested which seems generally viewed as undesirable. My impression was that support was split about evenly, actually. I take it that means you're against changing the default? I'm a bit surprised, since you started out agreeing that it's a bug. Personally I can't imagine any persuasive argument that a caching mechanism that can easily confuse /usr/local/private/var/rootpass and /var/tmp/verisign/certs/webcert.txt should be enabled by default in a web server. For anyone thinking, well, you're the only one who's ever seen this bug, I'd say no, we're just the first ones to discover this bug. It's quite possible that other people have run into it without knowing it, since AOLserver will just silently serve the wrong data. As for what I want, as I said, I was mainly bringing this up to shine a light on the issue and see what other people's thoughts were. That's been helpful in particular because I hadn't considered the security implications, which are quite serious; I may raise this issue on security forums as well so that people using ns_returnfile are aware of the danger of silent data corruption and/or information leaks and can review their code accordingly. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Personally I can't imagine any persuasive argument that a caching mechanism that can easily confuse /usr/local/private/var/rootpass and /var/tmp/verisign/certs/webcert.txt should be enabled by default in a web server. Oh, come on. Only if you're rapidly creating and deleting these files. I think it's interesting, at least, that this topic has created more traffic than the list usually sees in a whole year. Rusty -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tuesday 04:57 PM 8/19/2008, Rusty Brooks wrote: Personally I can't imagine any persuasive argument that a caching mechanism that can easily confuse /usr/local/private/var/rootpass and /var/tmp/verisign/certs/webcert.txt should be enabled by default in a web server. Oh, come on. Only if you're rapidly creating and deleting these files. Yes, I've explained the conditions several times. The point was that the files can be in completely different locations in the filesystem with completely different names, and may have secure contents. Again: this is not an academic point. This is an actual bug encountered in actual code, resulting in data corruption (effectively) and possible information leakage--and all because ns_returnfile X may not actually return file X. I don't doubt that there are other people who are also at risk due to this behavior of ns_returnfile/fastpath. If it's no big deal for you, great, but the security implications are nonetheless serious. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Hi, I haven't looked at a directory change notification type scheme in a long time but that could be very clever. Aside from addressing issues discussed here, the key benefit would be to avoid the repeated stat syscalls. Those stat calls always bothered me conceptually but the performance of the underlying systems always improved faster than my irritation would grow to do something about it. However, we were always careful to run websites against local filesystems - I would be more concerned with the overhead if we were using NFS or some other shared filesystem thing. Somewhat related, the dci module (a series of AOL extensions we open sourced awhile back) includes some content fetch/caching features called sob. That had the model you described -- things stayed in the cache until either space was needed or the server received an explicit flush message on a publish event. That approach worked well and scaled well but it wasn't entirely general nor naive, i.e., it was key that we understood how it worked under the covers and to make sure the flush message links were reliable to avoid stale content problems. Anyway, I've been pondering this whole discussion some more and agree with Tom -- the fastpath isn't broken. It just does a certain thing -- serves static files with a reasonable balance of performance and stability -- and shouldn't be modified except to add notes about how it works in the docs. I'm having trouble thinking through how it could be modified to plug all possible race conditions. I'd suggest the code snippets using fastpath for dynamic content should be modified, perhaps some new Tcl commands could be added to make it convenient, but otherwise it seems a mismatch between capabilities and requirements. -Jim On Aug 19, 2008, at 1:03 PM, Juan José del Río wrote: What about using epoll (or equivalent) in Linux, and kqueue in FreeBSD to tell the kernel to notify AOLServer in change a file has changed? That'd be a pretty easy and efficient way to discard fastpath items in case they have been deleted and/or modified. Just my two cents ;-) - Juan José del Río| (+34) 616 512 340| [EMAIL PROTECTED] Simple Option S.L. Tel: (+34) 951 930 122 Fax: (+34) 951 930 122 http://www.simpleoption.com On Tue, 2008-08-19 at 09:20 -0700, Tom Jackson wrote: Andrew, This is not a corner case. The exact same thing could happen without fastpath. What is that thing? That the contents of a file changes after a request is made and before the file is returned. In fact, there is no guarantee that it won't change mid-return. This is a fact of life with files on any filesystem. In fact, with the HTTP caching mechanisms, you could fail to get up-to-date contents of a file, since the If-Modified-Since mechanism will also fail. The problem here is that the application is using this static file handling API to serve dynamic content. Wondering why it doesn't work is pointless. Just to summarize again, this case requires that a file is created then destroyed and another file created within the same second that has the same size. Also, the original file must get into the cache, and the only way that can happen is for the application to treat it as a long lived static file. We have other means to cache dynamic data, and large chunks of dynamic content saved as a file can avoid the fastpath cache by setting the cachemaxsize parameter. Writing smaller content to disk doesn't make any sense if your goal is speed...or security. It is probably even more important to tamp down these misconceptions about how AOLserver works. Static and dynamic content are handled by different API. The reason is that it has long been recognized by the developers of AOLserver that different techniques are required to maintain high performance based upon how the content is generated, its expected lifespan, its size, and its potential for reuse. tom jackson On Tue, 2008-08-19 at 03:00 -0400, Andrew Piskorski wrote: On Mon, Aug 18, 2008 at 06:06:23PM -0700, John Caruso wrote: That'd be an improvement over the current situation, but it's still the case that AOLserver as currently shipped has a file cache mechanism built into it which 1) may return incorrect data and 2) is enabled by default. Given the risk, I'd say fastpath caching should be disabled by default rather than enabled. Sounds right to me. Either robustify Fastpath somehow against this corner case, or don't have Fastpath turned on by default. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can
Re: [AOLSERVER] Data corruption with fastpath caching
On 20/08/2008, at 9:57 AM, Rusty Brooks wrote: I think it's interesting, at least, that this topic has created more traffic than the list usually sees in a whole year. most of which isn't actually about the issue at hand, but rather whether John is an idiot for expecting ns_returnfile to behave as documented. cheers Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tuesday 05:39 PM 8/19/2008, Jim Davidson wrote: Your right, the code snippet below could trip over a race condition as you've described. It's not a race condition, actually; the code in that example was serialized, so there's no race involved. ...fastpath isn't broken. It's designed in such a way that it can return incorrect results (and ones that are wildly outside of reasonable expectations). Whether or not that's broken is a judgment call, and there we apparently differ, though I find that surprising--just because bad behavior can be documented and avoided doesn't mean it's not bad behavior. Anyway, for your app, it might be easiest to not change your code but instead write a new ns_returnfile to override the builtin -- maybe just with open and ns_returnfp. Yep, that was essentially my original suggestion to the developers. I can guarantee you that all uses of ns_returnfile will be receiving close scrutiny. :-) On Tuesday 05:59 PM 8/19/2008, Juan José del Río wrote: If you don't want to deactivate it, and have some C skills, I would recommend you to make the needed changes to fastpath code to enable it to use the kernel facilities of the operating system (in case you're using linux, then that'll be epoll system call; in FreeBSD case it's kqueue; etc.). This is an interesting suggestion, but from a quick scan of the epoll man page it doesn't look like it would work in this case since it acts on an open file descriptor, but fastpath associates file data with a (dev, inode, mtime, size) tuple without keeping an open file descriptor (and it'd be pretty wonky for AOLserver to keep open file descriptors for all files currently in the fastpath cache). No matter, though, we've got plenty of workarounds, and we'll probably just disable fastpath entirely since the benefits are likely vanishingly small anyway. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
One other idea could be to modify the code so that the Win32 behavior (cache by filename) could be made a configurable option (perhaps default on) instead of compile-time. Would be easy to fiddle with code snippets like this to make that happen: #ifdef _WIN32 key = file; #else ukey.dev = stPtr-st_dev; ukey.ino = stPtr-st_ino; key = (char *) ukey; #endif Note similar code is in ADP. The downside to filename-based cache is the keys are bigger strings instead of small fixed sized structures (tiny) and double caching for files which are actually the same via symlinks (could be large or nothing if you're using symlinks). Both cases result in more memory but perhaps safer results. One positive is the ns_cache_keys, size, etc. commands would show results for fastpath objects by filename (I think they ignore non-string based cache keys now). In fact, I suppose you could just add this to the top of fastpath.c and recompile to try it out: #define _WIN32 1 /* Get Win32 filename-based cache keys */ As long as your filenames are unique names, this may give you the results you're looking for (although I still think a dynamic app using files should use open/cached fd's). -Jim On Aug 19, 2008, at 9:25 PM, John Caruso wrote: On Tuesday 05:39 PM 8/19/2008, Jim Davidson wrote: Your right, the code snippet below could trip over a race condition as you've described. It's not a race condition, actually; the code in that example was serialized, so there's no race involved. ...fastpath isn't broken. It's designed in such a way that it can return incorrect results (and ones that are wildly outside of reasonable expectations). Whether or not that's broken is a judgment call, and there we apparently differ, though I find that surprising--just because bad behavior can be documented and avoided doesn't mean it's not bad behavior. Anyway, for your app, it might be easiest to not change your code but instead write a new ns_returnfile to override the builtin -- maybe just with open and ns_returnfp. Yep, that was essentially my original suggestion to the developers. I can guarantee you that all uses of ns_returnfile will be receiving close scrutiny. :-) On Tuesday 05:59 PM 8/19/2008, Juan José del Río wrote: If you don't want to deactivate it, and have some C skills, I would recommend you to make the needed changes to fastpath code to enable it to use the kernel facilities of the operating system (in case you're using linux, then that'll be epoll system call; in FreeBSD case it's kqueue; etc.). This is an interesting suggestion, but from a quick scan of the epoll man page it doesn't look like it would work in this case since it acts on an open file descriptor, but fastpath associates file data with a (dev, inode, mtime, size) tuple without keeping an open file descriptor (and it'd be pretty wonky for AOLserver to keep open file descriptors for all files currently in the fastpath cache). No matter, though, we've got plenty of workarounds, and we'll probably just disable fastpath entirely since the benefits are likely vanishingly small anyway. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
[AOLSERVER] Data corruption with fastpath caching
Consider the following pseudocode snippet: ...generate file $myfile in some way... ns_returnfile 200 text/plain $myfile ns_unlink $myfile If this snippet is executed in a tight loop on a Linux system, the chances of returning the wrong results are very high due to AOLserver's fastpath caching, which requires the following four attributes to be identical to consider a new file to be a cache hit (as per the FastReturn function in fastpath.c): 1) Same device number 2) Same inode number 3) Same modification time (within one second) 4) Same size Assuming $myfile is always on the same filesystem, number 1 is taken care of, and Linux reuses inode numbers, so the creation and deletion of $myfile will typically result in a file with the same inode. So in this example, files created within a given second that contains the same amount of data as a preceding file created within that same second will be considered identical, and will be erroneously served from cache. This isn't just a hypothetical, BTW; a client of mine ran into this issue and spent many weeks trying to figure out what was happening before tracing it back to AOLserver's fastpath caching. And the issue had existed for many years without being detected. I'm mainly bringing this up to shine a light on the issue and see what other people's views are. It's potentially a very serious issue given that it may silently corrupt data, and the fact that fastpath caching is enabled by default means that people may run into it without even knowing they're exposed to the danger. The best workaround I can think of (short of a checksum, which would defeat the purpose of caching in the first place) would be to check that the mtime or ctime of the file is some threshold number of seconds (e.g. 1 or 2) less than the current time, and not serve the file from cache if it's not. In other words, a file would have to be at least X seconds old (which could be a configurable value) before it could be served from the cache rather than from disk. Thoughts? - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
There is probably someone here that can directly address a better way to do what you want, with ns_cache or some other scheme, but it looks like your basic problem is saving rapidly changing data to disk or serving it from cache. Why do this? If you data is changing faster than once per second, don't keep a copy of it. It's not be a data corruption issue because you are choosing to overwrite the old data with new data using the exact same file name. If the data is important, don't overwrite it, thus no corruption. But in general it is not a good idea to do things the way you are, which is reading and writing the same file at the same time, which has nothing to do with fastpath. You should use a cond/mutex to serialize access. tom jackson On Mon, 2008-08-18 at 12:33 -0700, John Caruso wrote: Consider the following pseudocode snippet: ...generate file $myfile in some way... ns_returnfile 200 text/plain $myfile ns_unlink $myfile If this snippet is executed in a tight loop on a Linux system, the chances of returning the wrong results are very high due to AOLserver's fastpath caching, which requires the following four attributes to be identical to consider a new file to be a cache hit (as per the FastReturn function in fastpath.c): 1) Same device number 2) Same inode number 3) Same modification time (within one second) 4) Same size Assuming $myfile is always on the same filesystem, number 1 is taken care of, and Linux reuses inode numbers, so the creation and deletion of $myfile will typically result in a file with the same inode. So in this example, files created within a given second that contains the same amount of data as a preceding file created within that same second will be considered identical, and will be erroneously served from cache. This isn't just a hypothetical, BTW; a client of mine ran into this issue and spent many weeks trying to figure out what was happening before tracing it back to AOLserver's fastpath caching. And the issue had existed for many years without being detected. I'm mainly bringing this up to shine a light on the issue and see what other people's views are. It's potentially a very serious issue given that it may silently corrupt data, and the fact that fastpath caching is enabled by default means that people may run into it without even knowing they're exposed to the danger. The best workaround I can think of (short of a checksum, which would defeat the purpose of caching in the first place) would be to check that the mtime or ctime of the file is some threshold number of seconds (e.g. 1 or 2) less than the current time, and not serve the file from cache if it's not. In other words, a file would have to be at least X seconds old (which could be a configurable value) before it could be served from the cache rather than from disk. Thoughts? - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Monday 01:33 PM 8/18/2008, Tom Jackson wrote: It's not be a data corruption issue because you are choosing to overwrite the old data with new data using the exact same file name. If the data is important, don't overwrite it, thus no corruption. No, you've misunderstood the scenario. The file name needn't be the same to trigger this issue, and the corruption doesn't come from serving data out of a file that's changing, but rather because fastpath caching mistakenly identifies a new file as being identical to a previously-cached file (for the reasons I outlined) and erroneously serves the previously-cached data to the user. This is a design limitation and arguably a bug in the fastpath caching implementation, which is potentially quite serious since it silently serves the wrong data to the user. If you want a more straightforward (albeit contrived) demonstration of the problem, here you go: set file [open /var/tmp/myfile w] puts $file ABC123 close $file ns_returnfile 200 text/plain /var/tmp/myfile ns_unlink -nocomplain /var/tmp/myfile set file [open /var/tmp/myotherfile w] puts $file XYZ987 close $file ns_returnfile 200 text/plain /var/tmp/myotherfile ns_unlink -nocomplain /var/tmp/myotherfile Assuming that /var/tmp/myfile and /var/tmp/myotherfile are created within the same second, the fastpath caching algorithm will misidentify them as the same file, and ns_returnfile will therefore erroneously return the (previously cached) contents of /var/tmp/myfile when it should be returning the (uncached) contents of /var/tmp/myotherfile. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
John, Just to be clear: fastpath is for serving static content. This is not what you are doing here, you are creating a temporary file to store dynamic content. For your bug to work you must delete the old file and create a new one within the same second, etc. Also, your code sequence below will leave temporary files around in the case of a crash. If you want to safely serve the content from this temporary storage, you should unlink after you finish creating it (no other thread or process will be able to access the content, or you can unlink before you write the content and even local users will not be able to see the file. Then just send out the contents directly using the fd not the file name. (maybe something like: ns_return 200 [ns_guesstype $myfile] [read $fd] Then you can close the fd, although AOLserver does that automatically at the end of each request. Now: why are you writing the content to disk? Can't you use a temp variable. tom jackson On Mon, 2008-08-18 at 14:13 -0700, John Caruso wrote: On Monday 01:33 PM 8/18/2008, Tom Jackson wrote: It's not be a data corruption issue because you are choosing to overwrite the old data with new data using the exact same file name. If the data is important, don't overwrite it, thus no corruption. No, you've misunderstood the scenario. The file name needn't be the same to trigger this issue, and the corruption doesn't come from serving data out of a file that's changing, but rather because fastpath caching mistakenly identifies a new file as being identical to a previously-cached file (for the reasons I outlined) and erroneously serves the previously-cached data to the user. This is a design limitation and arguably a bug in the fastpath caching implementation, which is potentially quite serious since it silently serves the wrong data to the user. If you want a more straightforward (albeit contrived) demonstration of the problem, here you go: set file [open /var/tmp/myfile w] puts $file ABC123 close $file ns_returnfile 200 text/plain /var/tmp/myfile ns_unlink -nocomplain /var/tmp/myfile set file [open /var/tmp/myotherfile w] puts $file XYZ987 close $file ns_returnfile 200 text/plain /var/tmp/myotherfile ns_unlink -nocomplain /var/tmp/myotherfile Assuming that /var/tmp/myfile and /var/tmp/myotherfile are created within the same second, the fastpath caching algorithm will misidentify them as the same file, and ns_returnfile will therefore erroneously return the (previously cached) contents of /var/tmp/myfile when it should be returning the (uncached) contents of /var/tmp/myotherfile. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. The intent of fastpath is to avoid reading a seldom-changed file from disk. It happens to be used in ns_returnfile since that is the normal use case. On unix the fastpath cache is keyed off the dev/inode probably to keep the hash key shorter. Windows doesn't have device and inode numbers so it uses the filename as the hashkey, so it wouldn't run into this problem. From the server side, this could be fixed by: - adding in the filename to the hash key or checking that it is the same - making ns_unlink flush the entry from the fastpath cache - restricting what fastpath will cache - e.g., don't cache anything in /var/tmp or /tmp or a configuration-specified directory. - adding a -nocache flag to ns_returnfile All of these have pros and cons. I don't think your suggestion of waiting for cache entries to age a second or two would work well, it just moves the race condition around and adds a whole lot of disk activity when a busy server is warming up - static files might be read a few dozen times instead of once. Fixing it from the application side is much easier. Just use ns_returnfp instead of ns_returnfile, on the open handle if you generated the file from tcl code and it's convenient to get the handle, otherwise by opening the file right there: ...generate file $myfile in some way... set fp [open $myfile] ns_returnfp 200 text/plain $fp close $fp ns_unlink $myfile You'd probably lose some efficiency by not mmap-ing the file, but that's likely to be noise compared to generating the file in the first place. -J John Caruso wrote: On Monday 01:33 PM 8/18/2008, Tom Jackson wrote: It's not be a data corruption issue because you are choosing to overwrite the old data with new data using the exact same file name. If the data is important, don't overwrite it, thus no corruption. No, you've misunderstood the scenario. The file name needn't be the same to trigger this issue, and the corruption doesn't come from serving data out of a file that's changing, but rather because fastpath caching mistakenly identifies a new file as being identical to a previously-cached file (for the reasons I outlined) and erroneously serves the previously-cached data to the user. This is a design limitation and arguably a bug in the fastpath caching implementation, which is potentially quite serious since it silently serves the wrong data to the user. If you want a more straightforward (albeit contrived) demonstration of the problem, here you go: set file [open /var/tmp/myfile w] puts $file ABC123 close $file ns_returnfile 200 text/plain /var/tmp/myfile ns_unlink -nocomplain /var/tmp/myfile set file [open /var/tmp/myotherfile w] puts $file XYZ987 close $file ns_returnfile 200 text/plain /var/tmp/myotherfile ns_unlink -nocomplain /var/tmp/myotherfile Assuming that /var/tmp/myfile and /var/tmp/myotherfile are created within the same second, the fastpath caching algorithm will misidentify them as the same file, and ns_returnfile will therefore erroneously return the (previously cached) contents of /var/tmp/myfile when it should be returning the (uncached) contents of /var/tmp/myotherfile. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
I would call that a security issue then. Leaking the wrong data to the wrong connection is pretty serious. Jade Jade Rubick Director of Development Truist 120 Wall Street, 4th Floor New York, NY USA [EMAIL PROTECTED] +1 503 285 4963 +1 707 671 1333 fax The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On Mon, Aug 18, 2008 at 2:13 PM, John Caruso [EMAIL PROTECTED]wrote: On Monday 01:33 PM 8/18/2008, Tom Jackson wrote: It's not be a data corruption issue because you are choosing to overwrite the old data with new data using the exact same file name. If the data is important, don't overwrite it, thus no corruption. No, you've misunderstood the scenario. The file name needn't be the same to trigger this issue, and the corruption doesn't come from serving data out of a file that's changing, but rather because fastpath caching mistakenly identifies a new file as being identical to a previously-cached file (for the reasons I outlined) and erroneously serves the previously-cached data to the user. This is a design limitation and arguably a bug in the fastpath caching implementation, which is potentially quite serious since it silently serves the wrong data to the user. If you want a more straightforward (albeit contrived) demonstration of the problem, here you go: set file [open /var/tmp/myfile w] puts $file ABC123 close $file ns_returnfile 200 text/plain /var/tmp/myfile ns_unlink -nocomplain /var/tmp/myfile set file [open /var/tmp/myotherfile w] puts $file XYZ987 close $file ns_returnfile 200 text/plain /var/tmp/myotherfile ns_unlink -nocomplain /var/tmp/myotherfile Assuming that /var/tmp/myfile and /var/tmp/myotherfile are created within the same second, the fastpath caching algorithm will misidentify them as the same file, and ns_returnfile will therefore erroneously return the (previously cached) contents of /var/tmp/myfile when it should be returning the (uncached) contents of /var/tmp/myotherfile. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Jade, It is a security issue mostly because the code sequence is incorrect. (which also means that ns_returnfile should not be used for temp file return) The safe way to do this is to open the temp file, then immediately unlink it! Then write to the fd. BTW, this same bug exists in the ns_form/ns_conn files code which should use fd's instead of files. We need a little code cleanup here. tom jackson On Mon, 2008-08-18 at 15:30 -0700, Jade Rubick wrote: I would call that a security issue then. Leaking the wrong data to the wrong connection is pretty serious. Jade Jade Rubick Director of Development Truist 120 Wall Street, 4th Floor New York, NY USA [EMAIL PROTECTED] +1 503 285 4963 +1 707 671 1333 fax The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On Mon, Aug 18, 2008 at 2:13 PM, John Caruso [EMAIL PROTECTED] wrote: On Monday 01:33 PM 8/18/2008, Tom Jackson wrote: It's not be a data corruption issue because you are choosing to overwrite the old data with new data using the exact same file name. If the data is important, don't overwrite it, thus no corruption. No, you've misunderstood the scenario. The file name needn't be the same to trigger this issue, and the corruption doesn't come from serving data out of a file that's changing, but rather because fastpath caching mistakenly identifies a new file as being identical to a previously-cached file (for the reasons I outlined) and erroneously serves the previously-cached data to the user. This is a design limitation and arguably a bug in the fastpath caching implementation, which is potentially quite serious since it silently serves the wrong data to the user. If you want a more straightforward (albeit contrived) demonstration of the problem, here you go: set file [open /var/tmp/myfile w] puts $file ABC123 close $file ns_returnfile 200 text/plain /var/tmp/myfile ns_unlink -nocomplain /var/tmp/myfile set file [open /var/tmp/myotherfile w] puts $file XYZ987 close $file ns_returnfile 200 text/plain /var/tmp/myotherfile ns_unlink -nocomplain /var/tmp/myotherfile Assuming that /var/tmp/myfile and /var/tmp/myotherfile are created within the same second, the fastpath caching algorithm will misidentify them as the same file, and ns_returnfile will therefore erroneously return the (previously cached) contents of /var/tmp/myfile when it should be returning the (uncached) contents of /var/tmp/myotherfile. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Monday 03:38 PM 8/18/2008, Jeff Rogers wrote: While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. The intent of fastpath is to avoid reading a seldom-changed file from disk. I'd agree that that's the intent, but the caching is hidden within ns_returnfile and it's not clear at all from the user's perspective that this alligator is lurking in the swamp. Using ns_returnfile in this way may not be the best approach in any particular situation, but it's nonetheless a completely valid usage and isn't contraindicated in any AOLserver docs I've seen. It's not difficult to come up with examples where it might happen, BTW...say, a web service that returns the result of an operating system command to a user. I think Jade makes a good point that this is not only a bug but potentially a security issue. It happens to be used in ns_returnfile since that is the normal use case. On unix the fastpath cache is keyed off the dev/inode probably to keep the hash key shorter. Windows doesn't have device and inode numbers so it uses the filename as the hashkey, so it wouldn't run into this problem. No, it can still easily run into this problem--it's just that the file name needs to be the same in both cases (which actually did apply in my client's case, and caused confusion in the early debugging of the problem, since the assumption was that using the same file name and/or path name was the source of the problem). From the server side, this could be fixed by: - adding in the filename to the hash key or checking that it is the same No go, as observed above. - making ns_unlink flush the entry from the fastpath cache Nope, since the file can be removed via (e.g.) exec rm. - restricting what fastpath will cache - e.g., don't cache anything in /var/tmp or /tmp or a configuration-specified directory. - adding a -nocache flag to ns_returnfile This last is the one I'd considered as well, but the problem is that it puts the onus on the user to know that they should use the flag, and that's unlikely to be clear to them. I don't think your suggestion of waiting for cache entries to age a second or two would work well, it just moves the race condition around and adds a whole lot of disk activity when a busy server is warming up - static files might be read a few dozen times instead of once. Nope, not at all. The only files that would get read more than once would be those that were served within one second of being generated--which wouldn't apply to any content that fits the definition of static. So this is actually a fairly non-intrusive fix. The main limitation is that it relies on the file timestamps and the server timestamps being synchronized, which may not always be true. But I can't think of a better solution. Simply put, fastpath caching is inherently broken because it's not possible to guarantee that the file in question really should be served from cache (again, short of a cache-defeating checksum). Fixing it from the application side is much easier. Just use ns_returnfp instead of ns_returnfile, on the open handle if you generated the file from tcl code and it's convenient to get the handle, otherwise by opening the file right there: Yep, and that's more or less the workaround I'd suggested to my client. But my point here wasn't to ask about potential workarounds but to highlight the issue itself, since I haven't seen it mentioned before. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote: While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. I don't think it is a bug in fastpath. Think about the case where multiple logical files are actually the same physical file. Using the name would result in caching the same object under different names. This is a much more likely situation than this so called bug. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Mon, 2008-08-18 at 16:20 -0700, John Caruso wrote: It's not difficult to come up with examples where it might happen, BTW...say, a web service that returns the result of an operating system command to a user. The command is named ns_returnfile. The expectation is that you are returning a file, not a web service resource. The expectation is that the file will be around for longer than one second before being deleted and replaced. The fact that the documentation doesn't say this is unimportant. Inodes are reused, this is part of how the filesystem works. You could run into the same problem with an archive program. A file of the same inode, name, size and age is created replacing the old file. Most archive programs would not understand that the file contents had changed. Is it a bug? No. It is called a practical limitation. Anyway: no bug, just how it works. The only bug is how ns_returnfile is being used in the example. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Consider this use case: - You use git or another version control system to store for a bunch of static html files you serve with Aolserver. - You check out all of your static html files. Because they're all checked out at the same time, many of them have identical timestamps. Could the user get the wrong version of an html file they're being served? What about this scenario: - You have a web application that allows administrators on various sites hosted on your application to download a list of user names and passwords (this is a slightly contrived example). They can download it to CSV. - Admin #1 generates this file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-1234-passwords.csv. You return this file to the admin. - Admin #2 generates their file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-5000-passwords.csv. You attempt to return this file to the admin. Because their request was in the same second, however, they get site-1234-passwords.csv? Do I understand the problem correctly? I think both of these scenarios are pretty common examples of the way people use Aolserver currently, but I'm not sure if I'm understanding correctly the bug. Jade Jade Rubick Director of Development Truist 120 Wall Street, 4th Floor New York, NY USA [EMAIL PROTECTED] +1 503 285 4963 +1 707 671 1333 fax The information contained in this email/document is confidential and may be legally privileged. Access to this mail/document by anyone other than the intended recipient(s) is unauthorized. If you are not an intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance to it, is prohibited. On Mon, Aug 18, 2008 at 4:20 PM, John Caruso [EMAIL PROTECTED]wrote: On Monday 03:38 PM 8/18/2008, Jeff Rogers wrote: While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. The intent of fastpath is to avoid reading a seldom-changed file from disk. I'd agree that that's the intent, but the caching is hidden within ns_returnfile and it's not clear at all from the user's perspective that this alligator is lurking in the swamp. Using ns_returnfile in this way may not be the best approach in any particular situation, but it's nonetheless a completely valid usage and isn't contraindicated in any AOLserver docs I've seen. It's not difficult to come up with examples where it might happen, BTW...say, a web service that returns the result of an operating system command to a user. I think Jade makes a good point that this is not only a bug but potentially a security issue. It happens to be used in ns_returnfile since that is the normal use case. On unix the fastpath cache is keyed off the dev/inode probably to keep the hash key shorter. Windows doesn't have device and inode numbers so it uses the filename as the hashkey, so it wouldn't run into this problem. No, it can still easily run into this problem--it's just that the file name needs to be the same in both cases (which actually did apply in my client's case, and caused confusion in the early debugging of the problem, since the assumption was that using the same file name and/or path name was the source of the problem). From the server side, this could be fixed by: - adding in the filename to the hash key or checking that it is the same No go, as observed above. - making ns_unlink flush the entry from the fastpath cache Nope, since the file can be removed via (e.g.) exec rm. - restricting what fastpath will cache - e.g., don't cache anything in /var/tmp or /tmp or a configuration-specified directory. - adding a -nocache flag to ns_returnfile This last is the one I'd considered as well, but the problem is that it puts the onus on the user to know that they should use the flag, and that's unlikely to be clear to them. I don't think your suggestion of waiting for cache entries to age a second or two would work well, it just moves the race condition around and adds a whole lot of disk activity when a busy server is warming up - static files might be read a few dozen times instead of once. Nope, not at all. The only files that would get read more than once would be those that were served within one second of being generated--which wouldn't apply to any content that fits the definition of static. So this is actually a fairly non-intrusive fix. The main limitation is that it relies on the file timestamps and the server timestamps being synchronized, which may not always be true. But I can't think of a better solution. Simply put, fastpath caching is inherently broken because it's not possible to guarantee that the file in question really should be served from cache (again, short of a cache-defeating checksum). Fixing
Re: [AOLSERVER] Data corruption with fastpath caching
John Caruso wrote: I'd agree that that's the intent, but the caching is hidden within ns_returnfile and it's not clear at all from the user's perspective that this alligator is lurking in the swamp. Using ns_returnfile in this way may not be the best approach in any particular situation, but it's nonetheless a completely valid usage and isn't contraindicated in any AOLserver docs I've seen. This then is the real fix: mention it in the docs. I put a blurb on the appropriate wiki pages; feel free to suggest something better :) The docs in the distribution should be updated too. It happens to be used in ns_returnfile since that is the normal use case. On unix the fastpath cache is keyed off the dev/inode probably to keep the hash key shorter. Windows doesn't have device and inode numbers so it uses the filename as the hashkey, so it wouldn't run into this problem. No, it can still easily run into this problem--it's just that the file name needs to be the same in both cases (which actually did apply in my client's case, and caused confusion in the early debugging of the problem, since the assumption was that using the same file name and/or path name was the source of the problem). The system needs to be free to do some things to improve performance with the understanding that the user needs to be aware of those things or risk bad behaviour. I wouldn't call it an unreasonable assumption that a file with the same name (and same modtime etc) is the same file. You can run into a very similar problem with NFS (i.e., attribute caching causing a modified file to appear not so) and people have learned to deal with that. - making ns_unlink flush the entry from the fastpath cache Nope, since the file can be removed via (e.g.) exec rm. True, but I'd still put this in the system needs to be able to ... category above. The system does some things and the developer should be aware of those things. I don't think your suggestion of waiting for cache entries to age a second or two would work well, it just moves the race condition around and adds a whole lot of disk activity when a busy server is warming up - static files might be read a few dozen times instead of once. Nope, not at all. The only files that would get read more than once would be those that were served within one second of being generated--which wouldn't apply to any content that fits the definition of static. It would work in your exact case, where the file is always removed immediately after being served and generated. But if not, it would still come up with the wrong answer. 13:50:21 - create file 13:50:21 - serve file (gets cached) 13:50:21 - delete file 13:50:21 - create file again (reuses inode) ... time passes ... 13:55:11 - serve file In this case the file modtime is more than a few seconds old, but the cached mtime, inode, etc. are still matching the file on disk, so the stale cache entry would get delivered. There is also at least one clever optimization where static content does get served within a second of being created, where the 404 page is used to generate something like an image from something like a database and writes it to a file where it is subsequently served by fastpath. So this is actually a fairly non-intrusive fix. The main limitation is that it relies on the file timestamps and the server timestamps being synchronized, which may not always be true. But I can't think of a better solution. Simply put, fastpath caching is inherently broken because it's not possible to guarantee that the file in question really should be served from cache (again, short of a cache-defeating checksum). The same can be said about nearly any caching system: it is unable to handle changes in the data that happen outside of the cache's control or knowledge. This is just the bargain you make when you use a cache. But my point here wasn't to ask about potential workarounds but to highlight the issue itself, since I haven't seen it mentioned before. I think you highlighting it is most of the fix. From there, get the caveat inserted into the documentation and the knowledge into the community so that the next person who runs into this problem will have an easier, or at least less frustrating time solving it. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Tom Jackson wrote: Think about the case where multiple logical files are actually the same physical file. Using the name would result in caching the same object under different names. This is a much more likely situation than this so called bug. Huh, hard links - I sometimes forget about those. It's a much more believable reason (than my previous suggestion of shortening the key) for why the inode was used instead of the filename for the hash key. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 19/08/2008, at 9:37 AM, Tom Jackson wrote: On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote: While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. I don't think it is a bug in fastpath. fastpath is making assumptions about what means something is the same file, and those assumptions are not consistent with unix filesystem semantics - how is this not a bug? sure, the original use case that triggered this seems non-optimal, and could be done in other ways that don't trigger the bug, but that doesn't mean fastpath is behaving correctly... Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Mon, 2008-08-18 at 16:56 -0700, Jade Rubick wrote: Consider this use case: * You use git or another version control system to store for a bunch of static html files you serve with Aolserver. * You check out all of your static html files. Because they're all checked out at the same time, many of them have identical timestamps. Could the user get the wrong version of an html file they're being served? No, because each file has a different inode. The bug requires that you create and destroy one file and create another one within one second (so they have the same timestamp) also required that the same inode is used and that the file is the same exact size. But beyond that, hopefully your git checkout will maintain the original timestamp with the file. What about this scenario: * You have a web application that allows administrators on various sites hosted on your application to download a list of user names and passwords (this is a slightly contrived example). They can download it to CSV. * Admin #1 generates this file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-1234-passwords.csv. You return this file to the admin. * Admin #2 generates their file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-5000-passwords.csv. You attempt to return this file to the admin. Because their request was in the same second, however, they get site-1234-passwords.csv? Do I understand the problem correctly? I think both of these scenarios are pretty common examples of the way people use Aolserver currently, but I'm not sure if I'm understanding correctly the bug. The filename doesn't matter, neither does the source of the information. Two different requests could create files. The requirement is that the first is created and destroyed and the second is created within the same second as the first, reuses the inode, has the exact same size. This is why you should not use linked files (with path names) as temporary storage. Instead, open the file then unlink it (delete it from the filesystem), then use it via the fd. In short: there is no bug. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Monday 04:56 PM 8/18/2008, Jade Rubick wrote: Consider this use case: * You use git or another version control system to store for a bunch of static html files you serve with Aolserver. * You check out all of your static html files. Because they're all checked out at the same time, many of them have identical timestamps. Could the user get the wrong version of an html file they're being served? Nope, because in this case the inodes for the files would be different, so fastpath caching would distinguish them. What about this scenario: * You have a web application that allows administrators on various sites hosted on your application to download a list of user names and passwords (this is a slightly contrived example). They can download it to CSV. * Admin #1 generates this file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-1234-passwords.csv. You return this file to the admin. * Admin #2 generates their file. You create a unique filename for their site_id, because you want a unique filename to return back to the user: site-5000-passwords.csv. You attempt to return this file to the admin. Because their request was in the same second, however, they get site-1234-passwords.csv? Yep, it could happen in this case, assuming the files are deleted after they're returned to the user via ns_returnfile. As I mentioned, this bug wasn't discovered through code review or any theoretical process--it was causing problems in live code, and it was extremely difficult to track down. And the damage assessment is still underway. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 19/08/2008, at 10:13 AM, Jeff Rogers wrote: John Caruso wrote: The system needs to be free to do some things to improve performance with the understanding that the user needs to be aware of those things or risk bad behaviour. I wouldn't call it an unreasonable assumption that a file with the same name (and same modtime etc) is the same file. You can run into a very similar problem with NFS (i.e., attribute caching causing a modified file to appear not so) and people have learned to deal with that. the problem is that this can occur even if the filename is changed, and I'd argue that pretty convincingly violates the principle of least surprise. yes, of course the system needs to make some assumptions about what it can optimise, but if the contents of /tmp/userinfo-71562 might get served back when I've asked for /tmp/userinfo-61453 then there's something wrong. Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tue, 2008-08-19 at 10:01 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 9:37 AM, Tom Jackson wrote: On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote: While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. I don't think it is a bug in fastpath. fastpath is making assumptions about what means something is the same file, and those assumptions are not consistent with unix filesystem semantics - how is this not a bug? No, fastpath is making the exact same assumptions that any archive program would make, which is to record certain attributes at the time something is cached and then compare them with the same attributes at a later time. Unless you do a checksum or some other comparison, the cache system doesn't work very well for the intended purpose. sure, the original use case that triggered this seems non-optimal, and could be done in other ways that don't trigger the bug, but that doesn't mean fastpath is behaving correctly... The use case is a bug. You can't violate the essential granularity of the support system and call it a bug. The granularity is: inode, size, timestamp. Now, if we could just slow down AOLserver so that this never happens, that would be a great fix. This is like claiming that a checksum collision is a bug. No, it is expected. We don't use things like checksums, or inode,size,time as a key as a guarantee of anything. They are a compromise, in other words, engineering. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
russell muetzelfeldt wrote: On 19/08/2008, at 9:37 AM, Tom Jackson wrote: On Mon, 2008-08-18 at 15:38 -0700, Jeff Rogers wrote: While I'd agree this is a bug in fastpath, the real problem is that fastpath is being used at all in this case. I don't think it is a bug in fastpath. fastpath is making assumptions about what means something is the same file, and those assumptions are not consistent with unix filesystem semantics - how is this not a bug? It's not a bug because no one ever said that it *was* strictly following unix filesystem semantics, which isn't even a single thing (ufs is slightly different than nfs, is slightly different than ext2 -noatime, is slightly different than afs, etc.) It is following a particular definition: if the file still exists and has the same dev/inode/mtime/size as it did when you last checked, then it is the same file. This of it as a if-modified-since or if-none-match conditional GET. It is a bug in that it's not what you expect. However in that case, the location of the bug is subject to debate. -J sure, the original use case that triggered this seems non-optimal, and could be done in other ways that don't trigger the bug, but that doesn't mean fastpath is behaving correctly... Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tue, 2008-08-19 at 10:39 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 10:13 AM, Jeff Rogers wrote: John Caruso wrote: The system needs to be free to do some things to improve performance with the understanding that the user needs to be aware of those things or risk bad behaviour. I wouldn't call it an unreasonable assumption that a file with the same name (and same modtime etc) is the same file. You can run into a very similar problem with NFS (i.e., attribute caching causing a modified file to appear not so) and people have learned to deal with that. the problem is that this can occur even if the filename is changed, and I'd argue that pretty convincingly violates the principle of least surprise. yes, of course the system needs to make some assumptions about what it can optimise, but if the contents of /tmp/userinfo-71562 might get served back when I've asked for /tmp/userinfo-61453 then there's something wrong. If it were not for the fact that the same system is entirely responsible for the situation, then I would agree. What you are really hoping for here is an idiot proof system. The big hole in the reasoning here is that the important thing is the file name with path, and that somehow this name is immutably linked to some content. This is delusion. You want a transactional database but you are using a filesystem. Grow up. BTW, fastpath has configuration parameters. Maybe bone up on those first. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 19/08/2008, at 10:14 AM, Tom Jackson wrote: No, fastpath is making the exact same assumptions that any archive program would make, which is to record certain attributes at the time something is cached and then compare them with the same attributes at a Could the file name (just the name, not even the full path) not be added to the mix? Then using a random string as filename would make the problem go away, would it not? Also, would it be possible to tell ns_returnfile to not use fastpath, if it is for one time use? The alternative in this scenario would of course be to simply read the file and just ns_return it. Bas. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 19/08/2008, at 10:44 AM, Tom Jackson wrote: On Tue, 2008-08-19 at 10:01 +1000, russell muetzelfeldt wrote: sure, the original use case that triggered this seems non-optimal, and could be done in other ways that don't trigger the bug, but that doesn't mean fastpath is behaving correctly... The use case is a bug. You can't violate the essential granularity of the support system and call it a bug. The granularity is: inode, size, timestamp. Now, if we could just slow down AOLserver so that this never happens, that would be a great fix. yes, that's exactly what I said - fastpath should be removed. snark aside, if I say ns_returnfile /tmp/foo-abcd but nsd sends the contents of the now-deleted /tmp/bar-wxyz to the client then it's not doing what I've explicitly asked, and it's a bug. just because the correct (imo) response is tag WONTFIX, document as a gotcha, document workaround doesn't mean that the behaviour is correct. cheers Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
Tom Jackson wrote: No, because each file has a different inode. The bug requires that you create and destroy one file and create another one within one second (so they have the same timestamp) also required that the same inode is used and that the file is the same exact size. But beyond that, hopefully your git checkout will maintain the original timestamp with the file. The bug conditions are actually slightly looser than this, because fastpath checks mtime and not ctime. So a malicious user (or your version control system, if it makes the local files have the same timestamps as those in the repo) could overwrite a file at any point in the future, utime() it back to the same time and fastpath would still consider it the same. So would any number of unix utilities, like rsync, tar, zip, etc. Going back to my previous solutions, the only one on the server side that I still think is reasonable (names break hardlinks, cache flushing on unlink wasn't good in the first place, -nocache - why bother?) is to add a configuration option to exclude particular paths from fastpath. Actually not even a configuration option; that would involve a bit too much overhead for a marginal case; maybe a patch to fix this problem for users for whom it is a problem. Using an unlinked file as a temporary is the right thing to do most of the time, but I imagine ti could be difficult to do when you need to pass the filename around to uncooperative external programs. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Monday 05:13 PM 8/18/2008, Jeff Rogers wrote: Simply put, fastpath caching is inherently broken because it's not possible to guarantee that the file in question really should be served from cache (again, short of a cache-defeating checksum). The same can be said about nearly any caching system: it is unable to handle changes in the data that happen outside of the cache's control or knowledge. This is just the bargain you make when you use a cache. I'd say nearly any is going too far, and in fact I'd say that for most caching systems to fail to return the correct data is a serious bug. The NFS example you bring up isn't really analogous since it's only about attributes, which are frequently not a concern; were NFS to return incorrect *data* for a file, that would be a serious bug. And in this case we're talking about a web server that may silently return data that's completely incorrect, which I'd say is very serious misbehavior. The core problem here is that AOLserver is attempting to use the tuple of (dev, inode, mtime, size) as a unique determiner of a file's identity, and that's an inherently broken assumption--particularly so since the granularity of mtime is one second and inodes are reused on many filesystems (e.g. very common ones like ext3 and ufs). I think you highlighting it is most of the fix. From there, get the caveat inserted into the documentation and the knowledge into the community so that the next person who runs into this problem will have an easier, or at least less frustrating time solving it. That'd be an improvement over the current situation, but it's still the case that AOLserver as currently shipped has a file cache mechanism built into it which 1) may return incorrect data and 2) is enabled by default. Given the risk, I'd say fastpath caching should be disabled by default rather than enabled. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 19/08/2008, at 11:06 AM, John Caruso wrote: That'd be an improvement over the current situation, but it's still the case that AOLserver as currently shipped has a file cache mechanism built into it which 1) may return incorrect data and 2) is enabled by default. Given the risk, I'd say fastpath caching should be disabled by default rather than enabled. if someone's application is at risk of triggering this behaviour, that'd just delays any problem until their load is high enough that they need to turn on fastpath - and surely that's an even worse scenario. cheers Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tue, 2008-08-19 at 11:04 +1000, russell muetzelfeldt wrote: snark aside, if I say ns_returnfile /tmp/foo-abcd but nsd sends the contents of the now-deleted /tmp/bar-wxyz to the client then it's not doing what I've explicitly asked, and it's a bug. just because the correct (imo) response is tag WONTFIX, document as a gotcha, document workaround doesn't mean that the behaviour is correct. If your application wasn't the responsible party which violated the expectation you state, I would agree (maybe). The problem is that you think that the contents of a file remains unchanged as long as the filename itself remains unchanged. Actually the problem is that someone is using a file to store volatile data and then feeding this file through a cache. You really need to think about this insanity. Because it is insanity. 1. You waste time writing data to a file. 2. You use ns_returnfile to send this data (reading from disk). 3. Fastpath puts this information into memory (taking space). 4. ns_returnfile uses the memory copy on later requests (but none valid). 5. meanwhile the file is deleted, cache still exists taking up memory. The above are ideal conditions. The bug is not in ns_returnfile. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 19/08/2008, at 10:56 AM, Tom Jackson wrote: You want a transactional database but you are using a filesystem. Grow up. and If your application wasn't the responsible party which violated the expectation you state, I would agree (maybe). please go and re-read this thread, and get your parties straight. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 10:56 AM, Tom Jackson wrote: You want a transactional database but you are using a filesystem. Grow up. and If your application wasn't the responsible party which violated the expectation you state, I would agree (maybe). please go and re-read this thread, and get your parties straight. Sorry, I don't follow. Until someone explains to me why we need to be able to create and delete a file (then return it via fastpath), then create another file in the same second, I'll maintain that there is no bug in fastpath. The whole thing is a waste of time and space. We don't need to fix ns_returnfile so that it is easier to waste time or space. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On 19/08/2008, at 11:59 AM, Tom Jackson wrote: On Tue, 2008-08-19 at 11:37 +1000, russell muetzelfeldt wrote: On 19/08/2008, at 10:56 AM, Tom Jackson wrote: You want a transactional database but you are using a filesystem. Grow up. and If your application wasn't the responsible party which violated the expectation you state, I would agree (maybe). please go and re-read this thread, and get your parties straight. Sorry, I don't follow. ok, I'll spell it out. it's not my application that's violated the expectation I state. you haven't been paying attention to the From: headers, and seem to have mistaken me for the original poster of this thread. all I've been saying is that ns_returnfile filename returning the content of something other than filename, contrary to the documentation and common sense, is a bug. given that fastpath exists for a (good) reason, and that the behaviour which triggers the bug is marginal anyway, the correct response is the bug will not be fixed, here's why, and here's how to work around it. cheers Russell -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Data corruption with fastpath caching
On Monday 06:21 PM 8/18/2008, russell muetzelfeldt wrote: On 19/08/2008, at 11:06 AM, John Caruso wrote: That'd be an improvement over the current situation, but it's still the case that AOLserver as currently shipped has a file cache mechanism built into it which 1) may return incorrect data and 2) is enabled by default. Given the risk, I'd say fastpath caching should be disabled by default rather than enabled. if someone's application is at risk of triggering this behaviour, that'd just delays any problem until their load is high enough that they need to turn on fastpath - and surely that's an even worse scenario. I'd say it's still better, because it requires explicit action on the user's part to enable the flawed caching mechanism in that case. And actually I don't think fastpath in its default configuration would be of much help in performance terms these days, given that the cache is only 5MB large and file data is typically cached by the OS anyway (and servers generally have far more RAM than they did even five years ago). I do think this should have been considered (and steps taken to address it) when the fastpath caching mechanism was initially developed, since it's a glaring flaw. I've designed things that rely on shaky underlying assumptions in the past, but only in controlled circumstances where those assumptions were guaranteed to obtain. I can think of situations in which a caching mechanism with this type of design limitation wouldn't be an issue, but in my opinion it has no place being a default-enabled mechanism in an enterprise-grade web server. - John -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.