Re: [jira] [Created] (UIMA-4135) support for modifying indexed FSs

Marshall Schor Tue, 02 Dec 2014 08:50:37 -0800

If we did this, then the "optional" check for corrupting indices would go away,
and there may be a slight performance it as this would always be enabled.


The performance hit for updating a set of features in a new FS before adding it
to the index should be unmeasurable, I think. But would be worth measuring...

-Marshall


On 12/2/2014 11:14 AM, Marshall Schor wrote:
> A subsequent discussion with Burn L. produced the following two good ideas:
>
> 1) The UIMA framework could automatically do the safe thing on each feature
> modification that required it.  Although this might seem inefficient, it is
> likely that in most cases, only one feature (used as a key in some index spec)
> is being modified at any one time.  For those cases where this isn't true, the
> alternative of a index protection block encapsulating multiple updates could 
> be
> used; but it's likely that would rarely be needed.
>
> The automatic approach would, in effect, do a remove, modify, add-back cycle 
> for
> each feature modification, in all indices where the FS was in the index, if 
> the
> feature was used as a key.
>
> This would be a boon to users - as their code would now work without the 
> danger
> of accidentally corrupting indices.
>
> 2) Because this would turn a feature update into (potentially) a remove - 
> update
> - add operation, users writing feature updates inside an interator would be
> exposed to suddenly getting illegal index modification while iterating 
> exceptions.
>
> This has long been an issue, I think, causing users to write loops that 
> extract
> FSs into array lists and then iterate over those, while doing UIMA index adds/
> removes.
>
> How about we add a method to our iterator creation suite, perhaps named
> safeIterator(), which creates a snapshot of the index its iterating over at 
> the
> start, and then allows the user code to do arbitrary index adds/removes?  It
> seems this occurs frequently enough to warrant UIMA built-in support, and some
> optimizations may be available. It seems it could be especially helpful if (1)
> were implemented, because the remove/add could occur unbeknownst to the user. 
> For example, the component writer may not have had a feature in any index, but
> when his component was combined with others, an index could have been added 
> that
> used the feature.
>
> WDYT?
>
> -Marshall
>
> On 12/2/2014 10:26 AM, Marshall Schor wrote:
>> Richard, your good feedback set me thinking harder about this.  I think I 
>> agree
>> that the try / finally is in some sense, optional, and serves (only) to 
>> execute
>> the finally block in the presence of an exception.
>>
>> So, if we were to envisage a design without that, it might look as simple as 
>> this:
>>
>>   // put this at the start of a sequence where some index modifications might
>>   // cause index corruption 
>>   cas.beginIndexProtection();
>>  
>>   // User code, which does some modifications to existing FSs
>>   // However, UIMA iterators which check for fast-fail may fail in this 
>> section
>>  
>>   // indicate the end of the index protected sequence
>>   cas.endIndexProtection();
>>
>> Of course, you could optionally, use a Java 7 or 8 try / finally, to execute 
>> the
>> last line in the presence of an exception. 
>>
>> Some notes: 
>>
>> 1) The protection might do a remove-from-indices some FSs being modified as
>> needed, and then add them back.  at the end, which could cause iterators 
>> already
>> in existence that move to other than first or last to fail.
>>
>> 2) The protection might have some alternative style for Java 8 to facilitate 
>> the
>> try-with-resources.  It might look like this:
>>
>>    try (IndexProtection ip = cas.beginIndexProtection()) {
>>       // user code
>>    }
>>
>> where the IndexProtection class has a "close()" method which does the
>> cas.endIndexProtection() call.
>>
>> 3) I think the design ought to support nested begin/end protection blocks, to
>> facilitate subroutine modularity. For example, a separately developed 
>> subroutine
>> might be called by user code; this subroutine might, in turn, do its own
>> IndexProtection.
>>
>> 4) There is a danger with this design, in that a user could "forget" to put 
>> the
>> endIndexProtection in.
>>
>> -----------------------
>>
>> A safer design would be along the lines that Richard suggested - of using
>> "functional" (in Java 8 sense - having only one method) inner classes that
>> encapsulate the user code. 
>>
>> With respect to how to handle checked exceptions: in Java 8, you can declare 
>> a
>> "custom" functional interface that includes a throws clause, and use that to
>> allow the lambda to throw checked exceptions.  But you would probably end up
>> (since the user code might throw almost anything, from UIMAException to
>> IOException) saying it throws the top superclass of checked exceptions:
>> Exception.  That would require the caller to catch Exception (or include it 
>> in a
>> throws clause).
>>
>> The alternative would be to not have checked exceptions, and to require the 
>> user
>> code to encapsulate these as runtime exceptions.  For this particular 
>> use-case,
>> the setting of feature structure slots doesn't throw checked exceptions I 
>> think.
>>
>> So, for that design, it would look like Richard suggested.
>>
>>    // slight change of name, to indicate the function being supplied in the
>> encapsulation
>>    // Java 8
>>
>>    cas.protectIndices( () -> {
>>       // user code , no checked exceptions
>>    });
>>
>>    // Java 7
>>
>>    cas.protectIndices( new Runnable() { void run() {
>>         // user code, no checked exceptions
>>     )});
>>
>> So, I'm now leaning toward this style (with no checked exceptions) as 
>> suggested
>> by Richard, as the most reliable way: it prevents the user from "forgetting" 
>> to
>> invoke endIndexProtection(), and is not too cumbersome to write even in Java 
>> 7.
>>
>> Other opinions?
>>
>> -Marshall
>>
>>
>> On 12/1/2014 6:30 PM, Marshall Schor wrote:
>>> The anonymous inner class has a nice property that with Java 8 you can use 
>>> lambdas.
>>>
>>> A problem, though, I think is how to nicely handle thrown checked 
>>> exceptions. 
>>> With lambdas, I think you can't have checked exceptions.  With anonymous 
>>> inner
>>> classes, you can.  But of course the syntax is more difficult to understand.
>>>
>>> I'm not sure about the abuse part for try / finally.  (I'm not using try 
>>> /catch
>>> :-) ).  The try / finally is for the purpose of putting a block scope around
>>> some code, and then executing some code at the end of a block, even if an
>>> exception is thrown. 
>>>
>>> I need the signal of where the end of the block is, and to execute code 
>>> there,
>>> in order to add-back any FSs that might have been removed (if needed) in the
>>> body of the code while doing the feature updates.
>>>
>>> It seems that the try / finally (or Java 8's try with resources) has a 
>>> clearer
>>> syntax for specifying this than anything else I've thought of (but maybe 
>>> there's
>>> still a better way :-) ).
>>>
>>> -Marshall
>>>
>>> On 12/1/2014 4:09 PM, Richard Eckart de Castilho wrote:
>>>> On 01.12.2014, at 19:24, Marshall Schor <[email protected]> wrote:
>>>>
>>>>> One approach would use the try/ finally form:
>>>>>
>>>>>  controlVar = cas.startUimaIndexProtectedBlock();
>>>>>  try {
>>>>>    some code which modifies a FS (or maybe, multiple FSs
>>>>>  } finally {
>>>>>    controlVar.close();  // causes any "removes" to be now re-added to 
>>>>> indices
>>>>>  }
>>>>>
>>>>> A form like the above could use in Java 8 the simpler try-with-resources 
>>>>> form:
>>>>>  try (controlVar = cas.startUimaIndexProtectedBlock()) {
>>>>>    some code which modifies a FS (or maybe, multiple FSs
>>>>>  }
>>>> For me, this smells a but like abusing try/catch, although I admit that
>>>> it also has some elegance.
>>>>
>>>> Why not use an anonymous inner class like this:
>>>>
>>>> cas.transaction(new Transaction<CAS>() {
>>>>   void perform(CAS cas) {
>>>>     // make modifications
>>>>   }
>>>> });
>>>>
>>>> Afaik this works also in Java versions prior to 7. It's the kind of thing
>>>> one did before lambda arrived.
>>>>
>>>> Cheers,
>>>>
>>>> -- Richard
>>>>
>>
>
>

Re: [jira] [Created] (UIMA-4135) support for modifying indexed FSs

Reply via email to