On 02.12.2014, at 20:47, Marshall Schor <[email protected]> wrote: > Assuming we have "normally" the "automatic" style in effect, then yes, inside > the protection block the automatic style would be temporarily disabled. The > "removes" would still be done, but the info needed to do the addbacks would be > kept. So, at the point of a feature update, the remove (only if needed, of > course) would be done, and the update, but not the "add-back". (Remember, > that > doing the update before the remove causes index corruption.) For the second > and > subsequent update to (another) feature of that FS, no index operations would > be > done (it would already be "removed"). And then at the end, only the re-adding > of whatever was removed would be done.
I wonder if this would lead to surprising behaviour in the sense that a second query over an index would happening within the first one would suddenly not see the modified items anymore, because they have been removed as part of an update and only get added back at the end of the protection block. But then again, since we assume this is the unusual case (and I think it is), this would just be something the users need to be aware of. >> A flow controller or the component base classes could forcibly put the CAS >> back >> into protection mode in case that the component coder forgot it (and log a >> warning) - >> or it could even throw an exception in such a case. > > This would be a partial solution, because there are cases where there would > not > be a flow controller involved, or even a base class. A complete solution is > to > have the API follow the style of using an inner class as discussed in previous > notes in this change. I don't claim it would be a solution. It would just be a precaution to alert users. A logged warning could look like this: WARNING: CAS index protection disabled after processing - please use a try/finally construct to ensure the protection mode is reenabled. The inner class would be implemented in terms of the beginProtection() and endProtection() methods and use the try/finally construction. It would be the recommended approach to use it, but there might be cases where a user as a strange control flow or might want to use checked exceptions and would prefer doing the beginProtection/endProtection manually. > This "bulk" mode, though, I think would be the exception, because most users > set > lots of features when a new FS is created, but then the "typical" mode is to > update just a few, (I'm guessing :-) ). If this is true, then the "automatic" > mode (discussed at the top) would work, and the "bulk" mode would be relegated > to just an optimization for a less-usual case. So, most people would not need > to do anything, and their code would start working without corrupting indices. For the use-cases I know, setting features before adding to the index is definitely the common case - updating features is the rare case. I have the feeling that some concepts of the UIMA is somewhat more geared towards immutable FSes than mutable FSes, but this appears to change now. Isn't it the case that feature updates are also a problem for delta CAS because the delta mechanism doesn't notice the update? I mean, another approach to what is currently being done would be to create a copy of an FS whenever a feature is updated, remove the original from the indexes and read the new one - but then we'd run into other problems... I think Peter mentioned that Ruta does update features as part of its normal operation. But that's probably one of the very special use-cases that rarely is encountered and where the bulk-mode fits in nicely. Cheers, -- Richard
