[ https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283983#comment-16283983 ]
Richard Eckart de Castilho commented on UIMA-5662: -------------------------------------------------- I'm re-reading the proposal. Reddng the second time, it seems as if you do not really plan adding a new built-in type to the type system, rather having one or more maps in parallel to the CAS in which the user could track IDs. This seems like the approach taken in the XMI deserializer-serializer were (I believe) ID information can be recorded during de-serializion and re-used during serialization. I'm still not very convinced though. E.g. in the case of the CAS/annotation editor, I'd not only have to keep the CAS around, but also the map. It seems like when adding new FSes to the CAS, I'd have to manually figure out the next ID. True, the approach would allows supporting multiple maps. I could imagine it to be an interesting approach under some conditions/circumstances: * lookups FS -> ID are fast * lookups ID -> FS are fast (i.e. a uni-directional map would not be sufficient) * the maps are stored directly inside the CAS so that the client-code doesn't have to juggle them around manually * the client code can set up an ID assignment strategy for each map for cases where a new FS is created and added to the CAS * one such strategy should allow explicitly adding an ID->FS mapping e.g. in reader components where IDs may be obtained from the file format being read. I.e. the access to the maps should not be limited to UIMA framework code. And some open questions * Would there be some way of controlling the XMI element IDs using maybe a specially-named map and thus remove the need for the current XmiSerializationSharedData? * Is general provision for transporting out-of-type-system information introduced - and how would e.g. XMI deal with that? There could be some risks * If the maps are stored like FSes during serialization (e.g. in XMI), then it could cause problem with existing code that reads/writes XMI. For the time being, at least for me a single hard-coded map would be sufficient and it could be used to transport ID information from formats that support it to formats that support and it would be ok if there are only specific cases when this map would be effective, e.g. not if one CAS is merged into another one. > uv3 support CAS deserialization subsequent low level access > ----------------------------------------------------------- > > Key: UIMA-5662 > URL: https://issues.apache.org/jira/browse/UIMA-5662 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework > Affects Versions: 3.0.0SDK-beta > Reporter: Marshall Schor > Assignee: Marshall Schor > Priority: Minor > Fix For: 3.0.0SDK > > > Some users depend 1) constant v2-ids for FSs preserved in deserialization and > serialization, and 2) low level cas API access to these. > V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak > refs are used) prevent GC of unreachable FSs. > Based on a mode, set by -Duima.deserialize_perserve_ids, and also > controllable by new config option per deserialize call, alter the > deserialization for those deserializers which know about v2 ids, to put these > into the map used for low-level CAS access, using the actual v2 ids, and > change the v3 next available id for future new FSs to be 1 beyond the end. > The -Duima.deserialize-preserve_ids global setting is needed to handle the > use case of some annotators using low-level APIs, when part of a pipeline is > "remoted". -- This message was sent by Atlassian JIRA (v6.4.14#64029)