On 12/2/2014 11:51 AM, Richard Eckart de Castilho wrote: > On 02.12.2014, at 17:14, Marshall Schor <[email protected]> wrote: > >> A subsequent discussion with Burn L. produced the following two good ideas: >> >> 1) The UIMA framework could automatically do the safe thing on each feature >> modification that required it. Although this might seem inefficient, it is >> likely that in most cases, only one feature (used as a key in some index >> spec) >> is being modified at any one time. For those cases where this isn't true, >> the >> alternative of a index protection block encapsulating multiple updates could >> be >> used; but it's likely that would rarely be needed. >> >> The automatic approach would, in effect, do a remove, modify, add-back cycle >> for >> each feature modification, in all indices where the FS was in the index, if >> the >> feature was used as a key. >> >> This would be a boon to users - as their code would now work without the >> danger >> of accidentally corrupting indices. > Sounds good :) > > So by default, the CAS would protect itself. When a protection block (I cannot > help thinking of this as a kind of transaction) is used, then the protection > would be temporarily disabled and the modifications would be written to some > kind of "transaction log". When the block is closed, the log is "committed", > basically removing/readding all the modified FSes. Did I paraphrase this > correctly?
Assuming we have "normally" the "automatic" style in effect, then yes, inside the protection block the automatic style would be temporarily disabled. The "removes" would still be done, but the info needed to do the addbacks would be kept. So, at the point of a feature update, the remove (only if needed, of course) would be done, and the update, but not the "add-back". (Remember, that doing the update before the remove causes index corruption.) For the second and subsequent update to (another) feature of that FS, no index operations would be done (it would already be "removed"). And then at the end, only the re-adding of whatever was removed would be done. > > A flow controller or the component base classes could forcibly put the CAS > back > into protection mode in case that the component coder forgot it (and log a > warning) - > or it could even throw an exception in such a case. This would be a partial solution, because there are cases where there would not be a flow controller involved, or even a base class. A complete solution is to have the API follow the style of using an inner class as discussed in previous notes in this change. This "bulk" mode, though, I think would be the exception, because most users set lots of features when a new FS is created, but then the "typical" mode is to update just a few, (I'm guessing :-) ). If this is true, then the "automatic" mode (discussed at the top) would work, and the "bulk" mode would be relegated to just an optimization for a less-usual case. So, most people would not need to do anything, and their code would start working without corrupting indices. -Marshall > >> 2) Because this would turn a feature update into (potentially) a remove - >> update >> - add operation, users writing feature updates inside an interator would be >> exposed to suddenly getting illegal index modification while iterating >> exceptions. >> >> This has long been an issue, I think, causing users to write loops that >> extract >> FSs into array lists and then iterate over those, while doing UIMA index >> adds/ >> removes. > Totally :) > >> How about we add a method to our iterator creation suite, perhaps named >> safeIterator(), which creates a snapshot of the index its iterating over at >> the >> start, and then allows the user code to do arbitrary index adds/removes? > Sounds good as well. I think that some UIMA core iterator already copies FSes > to > some collection before returning it. Some of the uimaFIT select*() methods > certainly > do this (but not all - and it is not advertised to users). > >> It seems this occurs frequently enough to warrant UIMA built-in support, and >> some >> optimizations may be available. It seems it could be especially helpful if >> (1) >> were implemented, because the remove/add could occur unbeknownst to the >> user. >> For example, the component writer may not have had a feature in any index, >> but >> when his component was combined with others, an index could have been added >> that >> used the feature. >> >> WDYT? > It is probably not a common problem, but from the perspective of the > architecture, > it would be good to avoid negative side-effects from a third component adding > an > index that could cause undesired or even wrong behavior. > > Cheers, > > -- Richard >
