[
https://issues.apache.org/jira/browse/UIMA-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605494#comment-15605494
]
Marshall Schor commented on UIMA-5106:
--------------------------------------
another user says we should look at / consider using OIDs.
https://en.wikipedia.org/wiki/Object_identifier
The general use case is for future UIMA uses where Feature Structures are
generated and stored in a potentially widely distributed manner.
This could solve this problem:
* a client sends a CAS to 2 services (for parallel processing), who both
process it and return it
* the client (first thought) would adjust the unique IDs for one of the
returned CAS's new feature structures. This is actually done today for the
internal IDs.
But, on reflection, we might imagine that the purpose for having the unique ID
was to put that value into other features as well. There is no reasonable way
to find all those uses and re-adjust them as well, I think.
Using OIDs solves this, because they don't need adjusting. It could be
implemented along these lines:
* normal OIDs for new FSs would be, for instance ".1", ".2", ...
* OIDs for new FSs produced at a service from a client would have OIDs of .8.1,
.8.2, ... for one service, and .9.1, .9.2 etc, for another.
* These OIDs would never need adjusting.
* The prefix (.8 .9, in the above example) could be generated by the client,
and sent along with the CAS to each remote service call
Combining this with the facility to only have these things attached the subset
of Feature Structures users want unique ids for (using the reserved feature
name, which we might call uimaOID), this feels like a good direction to
consider, especially for farther in the future use cases.
> uv3 constant "id" for FSs (Proposed new Feature for uv3)
> --------------------------------------------------------
>
> Key: UIMA-5106
> URL: https://issues.apache.org/jira/browse/UIMA-5106
> Project: UIMA
> Issue Type: New Feature
> Components: Core Java Framework
> Reporter: Marshall Schor
> Priority: Minor
> Fix For: 3.0.0SDKexp
>
>
> Add constant ID for FSs. This would be an incrementing, long value. It would
> be constant through serialization/ deserialization cycles. There would be a
> lazily created map from longs to FSs (via weak links) to allow direct access
> from the ID to the FS. Lazy intent is to not have a cost for this
> (space/time) other than the cost for 1 long / FS, if it is not used.
> We could make this feature optional, as well, to avoid the 8 bytes per FS
> overhead, but in V3, I think that's not a good tradeoff (space savings vs
> complexity).
> Issues:
> * Current design allows parallelism of services, with returned results
> "stacked" into receiving CAS; would need to change (some of) the IDs coming
> back.
> CAS would need to have the high-water-mark value as part of serializations.
> Backwards compatibility:
> * loading V2 CASs: generate new IDs upon loading.
> * serializing to V2: (for connecting to V2 services): drop the IDs.
> This is a proposed new V3 feature; comments appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)