If we did this, then the "optional" check for corrupting indices would go away,
and there may be a slight performance it as this would always be enabled.
The performance hit for updating a set of features in a new FS before adding it
to the index should be unmeasurable, I think. But would be worth measuring...
-Marshall
On 12/2/2014 11:14 AM, Marshall Schor wrote:
> A subsequent discussion with Burn L. produced the following two good ideas:
>
> 1) The UIMA framework could automatically do the safe thing on each feature
> modification that required it. Although this might seem inefficient, it is
> likely that in most cases, only one feature (used as a key in some index spec)
> is being modified at any one time. For those cases where this isn't true, the
> alternative of a index protection block encapsulating multiple updates could
> be
> used; but it's likely that would rarely be needed.
>
> The automatic approach would, in effect, do a remove, modify, add-back cycle
> for
> each feature modification, in all indices where the FS was in the index, if
> the
> feature was used as a key.
>
> This would be a boon to users - as their code would now work without the
> danger
> of accidentally corrupting indices.
>
> 2) Because this would turn a feature update into (potentially) a remove -
> update
> - add operation, users writing feature updates inside an interator would be
> exposed to suddenly getting illegal index modification while iterating
> exceptions.
>
> This has long been an issue, I think, causing users to write loops that
> extract
> FSs into array lists and then iterate over those, while doing UIMA index adds/
> removes.
>
> How about we add a method to our iterator creation suite, perhaps named
> safeIterator(), which creates a snapshot of the index its iterating over at
> the
> start, and then allows the user code to do arbitrary index adds/removes? It
> seems this occurs frequently enough to warrant UIMA built-in support, and some
> optimizations may be available. It seems it could be especially helpful if (1)
> were implemented, because the remove/add could occur unbeknownst to the user.
> For example, the component writer may not have had a feature in any index, but
> when his component was combined with others, an index could have been added
> that
> used the feature.
>
> WDYT?
>
> -Marshall
>
> On 12/2/2014 10:26 AM, Marshall Schor wrote:
>> Richard, your good feedback set me thinking harder about this. I think I
>> agree
>> that the try / finally is in some sense, optional, and serves (only) to
>> execute
>> the finally block in the presence of an exception.
>>
>> So, if we were to envisage a design without that, it might look as simple as
>> this:
>>
>> // put this at the start of a sequence where some index modifications might
>> // cause index corruption
>> cas.beginIndexProtection();
>>
>> // User code, which does some modifications to existing FSs
>> // However, UIMA iterators which check for fast-fail may fail in this
>> section
>>
>> // indicate the end of the index protected sequence
>> cas.endIndexProtection();
>>
>> Of course, you could optionally, use a Java 7 or 8 try / finally, to execute
>> the
>> last line in the presence of an exception.
>>
>> Some notes:
>>
>> 1) The protection might do a remove-from-indices some FSs being modified as
>> needed, and then add them back. at the end, which could cause iterators
>> already
>> in existence that move to other than first or last to fail.
>>
>> 2) The protection might have some alternative style for Java 8 to facilitate
>> the
>> try-with-resources. It might look like this:
>>
>> try (IndexProtection ip = cas.beginIndexProtection()) {
>> // user code
>> }
>>
>> where the IndexProtection class has a "close()" method which does the
>> cas.endIndexProtection() call.
>>
>> 3) I think the design ought to support nested begin/end protection blocks, to
>> facilitate subroutine modularity. For example, a separately developed
>> subroutine
>> might be called by user code; this subroutine might, in turn, do its own
>> IndexProtection.
>>
>> 4) There is a danger with this design, in that a user could "forget" to put
>> the
>> endIndexProtection in.
>>
>> -----------------------
>>
>> A safer design would be along the lines that Richard suggested - of using
>> "functional" (in Java 8 sense - having only one method) inner classes that
>> encapsulate the user code.
>>
>> With respect to how to handle checked exceptions: in Java 8, you can declare
>> a
>> "custom" functional interface that includes a throws clause, and use that to
>> allow the lambda to throw checked exceptions. But you would probably end up
>> (since the user code might throw almost anything, from UIMAException to
>> IOException) saying it throws the top superclass of checked exceptions:
>> Exception. That would require the caller to catch Exception (or include it
>> in a
>> throws clause).
>>
>> The alternative would be to not have checked exceptions, and to require the
>> user
>> code to encapsulate these as runtime exceptions. For this particular
>> use-case,
>> the setting of feature structure slots doesn't throw checked exceptions I
>> think.
>>
>> So, for that design, it would look like Richard suggested.
>>
>> // slight change of name, to indicate the function being supplied in the
>> encapsulation
>> // Java 8
>>
>> cas.protectIndices( () -> {
>> // user code , no checked exceptions
>> });
>>
>> // Java 7
>>
>> cas.protectIndices( new Runnable() { void run() {
>> // user code, no checked exceptions
>> )});
>>
>> So, I'm now leaning toward this style (with no checked exceptions) as
>> suggested
>> by Richard, as the most reliable way: it prevents the user from "forgetting"
>> to
>> invoke endIndexProtection(), and is not too cumbersome to write even in Java
>> 7.
>>
>> Other opinions?
>>
>> -Marshall
>>
>>
>> On 12/1/2014 6:30 PM, Marshall Schor wrote:
>>> The anonymous inner class has a nice property that with Java 8 you can use
>>> lambdas.
>>>
>>> A problem, though, I think is how to nicely handle thrown checked
>>> exceptions.
>>> With lambdas, I think you can't have checked exceptions. With anonymous
>>> inner
>>> classes, you can. But of course the syntax is more difficult to understand.
>>>
>>> I'm not sure about the abuse part for try / finally. (I'm not using try
>>> /catch
>>> :-) ). The try / finally is for the purpose of putting a block scope around
>>> some code, and then executing some code at the end of a block, even if an
>>> exception is thrown.
>>>
>>> I need the signal of where the end of the block is, and to execute code
>>> there,
>>> in order to add-back any FSs that might have been removed (if needed) in the
>>> body of the code while doing the feature updates.
>>>
>>> It seems that the try / finally (or Java 8's try with resources) has a
>>> clearer
>>> syntax for specifying this than anything else I've thought of (but maybe
>>> there's
>>> still a better way :-) ).
>>>
>>> -Marshall
>>>
>>> On 12/1/2014 4:09 PM, Richard Eckart de Castilho wrote:
>>>> On 01.12.2014, at 19:24, Marshall Schor <[email protected]> wrote:
>>>>
>>>>> One approach would use the try/ finally form:
>>>>>
>>>>> controlVar = cas.startUimaIndexProtectedBlock();
>>>>> try {
>>>>> some code which modifies a FS (or maybe, multiple FSs
>>>>> } finally {
>>>>> controlVar.close(); // causes any "removes" to be now re-added to
>>>>> indices
>>>>> }
>>>>>
>>>>> A form like the above could use in Java 8 the simpler try-with-resources
>>>>> form:
>>>>> try (controlVar = cas.startUimaIndexProtectedBlock()) {
>>>>> some code which modifies a FS (or maybe, multiple FSs
>>>>> }
>>>> For me, this smells a but like abusing try/catch, although I admit that
>>>> it also has some elegance.
>>>>
>>>> Why not use an anonymous inner class like this:
>>>>
>>>> cas.transaction(new Transaction<CAS>() {
>>>> void perform(CAS cas) {
>>>> // make modifications
>>>> }
>>>> });
>>>>
>>>> Afaik this works also in Java versions prior to 7. It's the kind of thing
>>>> one did before lambda arrived.
>>>>
>>>> Cheers,
>>>>
>>>> -- Richard
>>>>
>>
>
>