Hi,

I would say as long as the CasCopier doesn't simply fail if it thinks that a 
copy wound be invalid/unsafe and as long as one can fix potentially broken 
copies afterwards, it would be in general ok. Ok, existing code might break...

The use-case below was half hypothetical. Very real is a reverse use-case which 
we have implemented in DKPro Core.

* view A contains a text
* view B is created through a transformation of the text from A
* annotations are created in view B
* annotations are copied back to view A
* offsets in the copied annotations are updated based on a reverse of the 
transformation operation in the second step

The code we currently use to handle the copying back looks like this:

CasCopier copier = new CasCopier(inputCas, outputCas);

for (FeatureStructure fs : selectFS(inputCas, getType(inputCas, typeName))) {
  if (!copier.alreadyCopied(fs)) {
    FeatureStructure fsCopy = copier.copyFs(fs);
    // Make sure that the sofa annotation in the copy is set
    if (fs instanceof AnnotationBaseFS) {
      FeatureStructure sofa = fsCopy.getFeatureValue(mDestSofaFeature);
      if (sofa == null) {
        fsCopy.setFeatureValue(mDestSofaFeature, outputCas.getSofa());
      }
    }
    aOutput.addFsToIndexes(fsCopy);
  }
}

Source: 
https://github.com/dkpro/dkpro-core/blob/7c8785647ca8c5905aa108251935069e601cbb8d/dkpro-core-api-transform-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/transform/JCasTransformer_ImplBase.java#L99

I guess this code would still work and wouldn't throw exceptions or such.

If I understand the diagrams in the wiki correctly, there is one case where the 
sofa of the copied FS points to the source view but the FS in indexed in the 
target view. This seems to be the only difference between the case copying 
between CASes and within a CAS. I think it may be better/simpler/more 
consistent to set the sofa of the copy to null in both cases and if the user 
really wants the FS to point to a sofa in a different view, then he should set 
the sofa in this was manually after the copy is complete. 

Btw... at least when copying individual FSes, the copy isn't indexed anyway by 
the CasCopier. We are talking only about the bulk-copy method then?

Cheers,

-- Richard

> On 01.04.2016, at 15:57, Marshall Schor <[email protected]> wrote:
> 
> Hi Richard,
> 
> Thanks for this use-case.  I think there may be 2 subcases.
> 
> 1) The views, A and B, are in the same CAS, and
> 2) The views, A and B, are in different CASes
> 
> In case 1), with this new proposal the annotations copied from view A to B 
> would
> have their "sofa" reference continue to point to the text in view A.  This 
> means:
> 
> a) The references into the text are still "valid", but of course point to the
> text in view A.
> b) To do the updating process to have them point to the de-xml'ed version of 
> the
> text, not only do the begin/end references need to be updated, but the sofa
> reference needs to be changed.  We could add an API to update that to the
> current view's.
> 
> In case 2), the annotations in B would no longer have a valid sofa reference 
> at
> all (it would be set to null).
> This would clearly be a problem; but once again, we could add an API to update
> that to the current view's.
> 
> --------------------------------
> 
> So, it looks like this proposed design change would break the use-case you
> suggested. 
> 
> The current design would seems to support this use case but only if the two
> views are in different CASes.
> If they were in the same CAS, I think the current implementation (not tested,
> just reading the code) would have the copied Annotations have their sofa
> references be to the sofa in CAS A.
> 
> Does this match what you're currently seeing?
> 
> -Marshall
> 
> 
> On 3/31/2016 4:36 PM, Richard Eckart de Castilho wrote:
>> On 31.03.2016, at 21:22, Marshall Schor <[email protected]> wrote:
>>> I'm thinking of changing how cas copier works with respect to managing 
>>> Sofas and
>>> sofa ref updating.  I've written something up here:
>>> https://cwiki.apache.org/confluence/display/UIMA/CasCopier+and+Views
>>> 
>>> Comments / feedback / what did I overlook?  appreciated :-) -Marshall
>> Consider the following case:
>> 
>> - there are two views, A and B
>> - the text in B has been derived from A through some transformation, e.g. 
>> the removal of XML tags
>> - A contains UIMA annotations that represent the XML tags and the point into 
>> the text in A
>> - as part of a second transformation process, all annotations in A are to be 
>> copied into B
>> - after the copy has been performed, the offsets of the copied annotations 
>> are updated
>> 
>> Would such a scenario still be supported after the changes you suggest?
>> 
>> Best,
>> 
>> -- Richard

Reply via email to