[ 
https://issues.apache.org/jira/browse/UIMA-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605494#comment-15605494
 ] 

Marshall Schor commented on UIMA-5106:
--------------------------------------

another user says we should look at / consider using OIDs.  
https://en.wikipedia.org/wiki/Object_identifier

The general use case is for future UIMA uses where Feature Structures are 
generated and stored in a potentially widely distributed manner.

This could solve this problem:
* a client sends a CAS to 2 services (for parallel processing), who both 
process it and return it
* the client (first thought) would adjust the unique IDs for one of the 
returned CAS's new feature structures.   This is actually done today for the 
internal IDs.

But, on reflection, we might imagine that the purpose for having the unique ID 
was to put that value into other features as well.  There is no reasonable way 
to find all those uses and re-adjust them as well, I think.

Using OIDs solves this, because they don't need adjusting.  It could be 
implemented along these lines:
* normal OIDs for new FSs would be, for instance ".1", ".2", ...
* OIDs for new FSs produced at a service from a client would have OIDs of .8.1, 
.8.2, ... for one service, and .9.1, .9.2 etc, for another. 
* These OIDs would never need adjusting.  
* The prefix (.8 .9, in the above example) could be generated by the client, 
and sent along with the CAS to each remote service call

Combining this with the facility to only have these things attached the subset 
of Feature Structures users want unique ids for (using the reserved feature 
name, which we might call uimaOID),  this feels like a good direction to 
consider, especially for farther in the future use cases.

> uv3 constant "id" for FSs (Proposed new Feature for uv3)
> --------------------------------------------------------
>
>                 Key: UIMA-5106
>                 URL: https://issues.apache.org/jira/browse/UIMA-5106
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDKexp
>
>
> Add constant ID for FSs. This would be an incrementing, long value. It would 
> be constant through serialization/ deserialization cycles. There would be a 
> lazily created map from longs to FSs (via weak links) to allow direct access 
> from the ID to the FS.  Lazy intent is to not have a cost for this 
> (space/time) other than the cost for 1 long / FS, if it is not used.
> We could make this feature optional, as well, to avoid the 8 bytes per FS 
> overhead, but in V3, I think that's not a good tradeoff (space savings vs 
> complexity).  
> Issues: 
> * Current design allows parallelism of services, with returned results 
> "stacked" into receiving CAS; would need to change (some of) the IDs coming 
> back.
> CAS would need to have the high-water-mark value as part of serializations.
> Backwards compatibility:
> * loading V2 CASs: generate new IDs upon loading.
> * serializing to V2: (for connecting to V2 services): drop the IDs.
> This is a proposed new V3 feature; comments appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to