[ 
https://issues.apache.org/jira/browse/UIMA-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605203#comment-15605203
 ] 

Marshall Schor commented on UIMA-5106:
--------------------------------------

Thinking harder about this, I'd like to close this Jira as 
won't-do-it-this-way, and open a new one that changes the goal slightly to 
support a user-specified unique ID feature which could selectively be added to 
selected Feature Structure (FS) declarations.

The main difference is this allows users to specify which FS Types they want 
this additional ID on.  This allows other FS to remain more light-weight.  Some 
consequences:

* The built-in FSarray would not have this ID (it doesn't have fields).
* No space cost in FSs of this when not being used
* No space/time cost for doing the special indexing by id for FSs the user is 
not interested in (for example, the little FSs that make up the list cells in 
the various FSLists).

2 approaches come to mind:
# having a "reserved" feature name.  The user would declare this feature with 
range "long" on any FS where they wanted the unique ID
# letting users designate one or more features of type long to be a unique-id, 
using an API call.

The 2nd approach has some difficulties with type merging - the "application" 
consuming someone else's aggregate+typesystem may not know the other's 
assumptions about unique-id.

So I think the "reserved name" approach would be best.  Possible feature name: 
uimaBuiltInUID  or uimaUID (UIMA Unique ID).

Other thoughts welcome.  

> uv3 constant "id" for FSs (Proposed new Feature for uv3)
> --------------------------------------------------------
>
>                 Key: UIMA-5106
>                 URL: https://issues.apache.org/jira/browse/UIMA-5106
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDKexp
>
>
> Add constant ID for FSs. This would be an incrementing, long value. It would 
> be constant through serialization/ deserialization cycles. There would be a 
> lazily created map from longs to FSs (via weak links) to allow direct access 
> from the ID to the FS.  Lazy intent is to not have a cost for this 
> (space/time) other than the cost for 1 long / FS, if it is not used.
> We could make this feature optional, as well, to avoid the 8 bytes per FS 
> overhead, but in V3, I think that's not a good tradeoff (space savings vs 
> complexity).  
> Issues: 
> * Current design allows parallelism of services, with returned results 
> "stacked" into receiving CAS; would need to change (some of) the IDs coming 
> back.
> CAS would need to have the high-water-mark value as part of serializations.
> Backwards compatibility:
> * loading V2 CASs: generate new IDs upon loading.
> * serializing to V2: (for connecting to V2 services): drop the IDs.
> This is a proposed new V3 feature; comments appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to