Hi,
I would say as long as the CasCopier doesn't simply fail if it thinks that a
copy wound be invalid/unsafe and as long as one can fix potentially broken
copies afterwards, it would be in general ok. Ok, existing code might break...
The use-case below was half hypothetical. Very real is a reverse use-case which
we have implemented in DKPro Core.
* view A contains a text
* view B is created through a transformation of the text from A
* annotations are created in view B
* annotations are copied back to view A
* offsets in the copied annotations are updated based on a reverse of the
transformation operation in the second step
The code we currently use to handle the copying back looks like this:
CasCopier copier = new CasCopier(inputCas, outputCas);
for (FeatureStructure fs : selectFS(inputCas, getType(inputCas, typeName))) {
if (!copier.alreadyCopied(fs)) {
FeatureStructure fsCopy = copier.copyFs(fs);
// Make sure that the sofa annotation in the copy is set
if (fs instanceof AnnotationBaseFS) {
FeatureStructure sofa = fsCopy.getFeatureValue(mDestSofaFeature);
if (sofa == null) {
fsCopy.setFeatureValue(mDestSofaFeature, outputCas.getSofa());
}
}
aOutput.addFsToIndexes(fsCopy);
}
}
Source:
https://github.com/dkpro/dkpro-core/blob/7c8785647ca8c5905aa108251935069e601cbb8d/dkpro-core-api-transform-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/transform/JCasTransformer_ImplBase.java#L99
I guess this code would still work and wouldn't throw exceptions or such.
If I understand the diagrams in the wiki correctly, there is one case where the
sofa of the copied FS points to the source view but the FS in indexed in the
target view. This seems to be the only difference between the case copying
between CASes and within a CAS. I think it may be better/simpler/more
consistent to set the sofa of the copy to null in both cases and if the user
really wants the FS to point to a sofa in a different view, then he should set
the sofa in this was manually after the copy is complete.
Btw... at least when copying individual FSes, the copy isn't indexed anyway by
the CasCopier. We are talking only about the bulk-copy method then?
Cheers,
-- Richard
> On 01.04.2016, at 15:57, Marshall Schor <[email protected]> wrote:
>
> Hi Richard,
>
> Thanks for this use-case. I think there may be 2 subcases.
>
> 1) The views, A and B, are in the same CAS, and
> 2) The views, A and B, are in different CASes
>
> In case 1), with this new proposal the annotations copied from view A to B
> would
> have their "sofa" reference continue to point to the text in view A. This
> means:
>
> a) The references into the text are still "valid", but of course point to the
> text in view A.
> b) To do the updating process to have them point to the de-xml'ed version of
> the
> text, not only do the begin/end references need to be updated, but the sofa
> reference needs to be changed. We could add an API to update that to the
> current view's.
>
> In case 2), the annotations in B would no longer have a valid sofa reference
> at
> all (it would be set to null).
> This would clearly be a problem; but once again, we could add an API to update
> that to the current view's.
>
> --------------------------------
>
> So, it looks like this proposed design change would break the use-case you
> suggested.
>
> The current design would seems to support this use case but only if the two
> views are in different CASes.
> If they were in the same CAS, I think the current implementation (not tested,
> just reading the code) would have the copied Annotations have their sofa
> references be to the sofa in CAS A.
>
> Does this match what you're currently seeing?
>
> -Marshall
>
>
> On 3/31/2016 4:36 PM, Richard Eckart de Castilho wrote:
>> On 31.03.2016, at 21:22, Marshall Schor <[email protected]> wrote:
>>> I'm thinking of changing how cas copier works with respect to managing
>>> Sofas and
>>> sofa ref updating. I've written something up here:
>>> https://cwiki.apache.org/confluence/display/UIMA/CasCopier+and+Views
>>>
>>> Comments / feedback / what did I overlook? appreciated :-) -Marshall
>> Consider the following case:
>>
>> - there are two views, A and B
>> - the text in B has been derived from A through some transformation, e.g.
>> the removal of XML tags
>> - A contains UIMA annotations that represent the XML tags and the point into
>> the text in A
>> - as part of a second transformation process, all annotations in A are to be
>> copied into B
>> - after the copy has been performed, the offsets of the copied annotations
>> are updated
>>
>> Would such a scenario still be supported after the changes you suggest?
>>
>> Best,
>>
>> -- Richard