Hi,
I changed CasIOUtils to use the Header and I extended the header with a bit (0x08) indicating an included type system. No information about the serialization of the type system yet. The java-serialized formats now have also a binary header as I did not want to make the header serializable as it should be read/written by the same functionality. I have thought that old UIMA versions (e.g., 2.8.1) should be able to load new CAS files, but my tests failed. No idea yet why. I am overall not very happy with the current solution, but I could live with it. Maybe someone wants to take a look at it? Best, Peter Am 20.07.2016 um 14:30 schrieb Peter Klügl: > Hi, > > > I'll try to find the time to do these changes this week, next week latest. > > > btw, input stream sniffing in order to distinguish XMI and XCAS is > currently not supported. There could be a lot of text before the > relevant element occurs, e.g., license text. > > > Best, > > > Peter > > > Am 20.07.2016 um 14:19 schrieb Marshall Schor: >> Hi, >> >> We can change the header, but: >> >> The changed header ought to be "readable" by previous versions of UIMA. >> >> For XMI and XCAS, these do not currently have special headers, and if we >> added >> these, those formats could not be read by older versions of UIMA. Those >> formats >> contain sufficient distinguishing initial strings to distinguish them, >> though. >> >> The XMI format is specified, also, in an OASIS standard which the UIMA >> project >> is said to (mostly) follow: http://uima.apache.org/uima-specification.html >> >> For binary serializations, I think there's room in the header for an extra >> bit, >> which if on, could indicate that a type system was included. I think it >> would >> be good to have a header extension, when type systems are included, to >> specify >> the format and version of the type system serialization. >> >> Most serializations in core UIMA have not included the type system. The one >> which does is CASCompleteSerializer. This is a "serializable" (using >> standard >> Java serializations) object containing serializable forms of the CAS and Type >> System. >> >> Regarding making methods in CommonSerDes public: >> >> It is fine to make them public in the sense that they are accessible from >> other >> packages, not in a sub-type hierarchy. But I think it is best to not include >> CommonSerDes in a package which is intended for end-users, because the end >> user >> UIMA APIs should be (as much as possible) stable over a long time period. >> Details of how we evolve headers, etc., should not disturb end users, if >> possible; keeping these as public but in packages with names like xxx.impl or >> xyz.internal.abc etc. is the way this has been traditionally done. It >> allows us >> to evolve these without affecting end-user APIs. >> >> Just to be clear: I would not consider uimaFIT and Ruta to be "end-users", as >> they are developed within the UIMA project, and we are willing to evolve them >> together with UIMA core changes. >> >> We don't have a deadline for the next release, but it's mostly ready to go, >> and >> will solve a significant issue for people wanting to upgrade their Eclipse to >> Neon :-). >> >> -Marshall >> >> On 7/20/2016 5:03 AM, Peter Klügl wrote: >>> Ok, after looking at the code I must admit that there is much more to do >>> than I epxected. We first need to discuss several things: >>> >>> - can we change the header at all? >>> >>> - do we support type system inclusion in the header? >>> >>> - do we support type system inclusion in the serialized files? >>> >>> - which serial format are which ones? >>> >>> - can we make the methods in CommonSerDes public? >>> >>> >>> What is the deadline for the release? I am now quite loaded with work >>> until next Wednesday :-( >>> >>> >>> Best, >>> >>> >>> Peter >>> >>> >>> Am 19.07.2016 um 22:39 schrieb Marshall Schor: >>>> Great. >>>> >>>> There's now also common code for writing / reading UIMA serialization >>>> headers, in >>>> >>>> CommonSerDes (in org.apache.uima.cas.impl ) >>>> >>>> This includes the extensions to support versioning the serializations, >>>> which >>>> start to be needed in the next release because a bug fix is slightly >>>> changing >>>> the serialized form for **delta binary** CAS. >>>> >>>> So, it would be good to use that rather than have another separate header >>>> reader/writer to maintain. >>>> >>>> -Marshall >>>> >>>> >>>> On 7/19/2016 4:13 PM, Peter Klügl wrote: >>>>> Ah, I didn't know that enum. I'll adapt the code and enum. >>>>> >>>>> Am 19.07.2016 um 20:09 schrieb Marshall Schor: >>>>>> We already have an enum in the core for various serial formats. The >>>>>> class is >>>>>> >>>>>> public enum SerialFormat { >>>>>> UNKNOWN, >>>>>> XCAS, // with reachability filtering >>>>>> XMI, // with reachability filtering >>>>>> BINARY, // no filtering >>>>>> COMPRESSED, // no filtering (form 4) >>>>>> COMPRESSED_FILTERED, // with reachability and type and feature >>>>>> filtering >>>>>> (form 6) >>>>>> COMPRESSED_PROJECTION, // with subset of views >>>>>> } >>>>>> >>>>>> (I don't think COMPRESSED_PROJECTION is in use...) >>>>>> >>>>>> This has been around for maybe 3 years. I would be in favor of >>>>>> considering >>>>>> using and/or extending this as needed, rather than having two formats >>>>>> (that is, >>>>>> the proposed SerializationFormat class). >>>>>> >>>>>> -Marshall >>>>>> >>>>>> On 7/19/2016 2:49 AM, Peter Klügl wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> yes, the class should be officially available to external code. I >>>>>>> already included it in the CAS Editor and in Ruta. I also plan to use it >>>>>>> in our inhouse code. I'll change the enforcer rule. >>>>>>> >>>>>>> >>>>>>> I can write the docs but any help is welcome since I do not know how >>>>>>> much spare time I have for the rest of the week for this. I'll take a >>>>>>> look where the documentation should be added. Haven't looked to it for >>>>>>> some time ;-) >>>>>>> >>>>>>> >>>>>>> I just chose the name of the class Richard contributed since I thought >>>>>>> it is really suitable. Then, I also noticed the uimaFIT class. This is a >>>>>>> not really good situation, but I would not change the name because of >>>>>>> it. >>>>>>> >>>>>>> >>>>>>> I would not split the API form the implementation. I do not see any >>>>>>> advantages right now. The class is just a simple utils class with only >>>>>>> static methods like CasCreationUtils (which is also not separated). >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> Am 18.07.2016 um 22:26 schrieb Marshall Schor: >>>>>>>> This is OK with me. I can even volunteer to write the docs (but am >>>>>>>> happy to >>>>>>>> others do it :-) ). >>>>>>>> >>>>>>>> I'll wait to hear about the split (if any) between the public API and >>>>>>>> the >>>>>>>> impl. >>>>>>>> >>>>>>>> And, we'll need to change the next version # to 2.9.0, from 2.8.2, due >>>>>>>> to this >>>>>>>> being that kind of a change. >>>>>>>> >>>>>>>> Is everyone OK with all of this? >>>>>>>> >>>>>>>> -Marshall >>>>>>>> >>>>>>>> On 7/18/2016 2:39 PM, Richard Eckart de Castilho wrote: >>>>>>>>> I believe the intention is that this class becomes part of the public >>>>>>>>> API. >>>>>>>>> >>>>>>>>> Also, my understanding is that it would do a superset of what the >>>>>>>>> uimaFIT class by the same name does. We could then probably deprecate >>>>>>>>> the respective uimaFIT class and suggest using the core class instead. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> -- Richard >>>>>>>>> >>>>>>>>>> On 18.07.2016, at 20:30, Marshall Schor <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> This is a new class added to uimaj-core project, in >>>>>>>>>> org.apache.uima.util >>>>>>>>>> package. This is fine if this is to be part of the official public >>>>>>>>>> APIs >>>>>>>>>> supported by UIMA going forward; but if that is the case, it should >>>>>>>>>> probably be >>>>>>>>>> documented in the UIMA docs, and we'd have to change the version >>>>>>>>>> number >>>>>>>>>> (due to >>>>>>>>>> enforcer rules). >>>>>>>>>> >>>>>>>>>> If this is more of an internal use utilities, then it should be in >>>>>>>>>> one of >>>>>>>>>> the >>>>>>>>>> internal use packages, such as >>>>>>>>>> >>>>>>>>>> org.apache.uima.internal.util >>>>>>>>>> >>>>>>>>>> This class is similarly named to a UIMAFit class; are these related? >>>>>>>>>> >>>>>>>>>> If some of the APIs are to be permanent and public and part of the >>>>>>>>>> official >>>>>>>>>> public APIs, but some are internal implementation details, please >>>>>>>>>> consider using >>>>>>>>>> an interface and an ".impl" (or equivalent) approach; packages which >>>>>>>>>> support >>>>>>>>>> these are: >>>>>>>>>> >>>>>>>>>> org.apache.uima.util and >>>>>>>>>> >>>>>>>>>> org.apache.uima.util.impl >>>>>>>>>> >>>>>>>>>> -------------- >>>>>>>>>> >>>>>>>>>> If this is only an internal kind of change, not intending to affect >>>>>>>>>> the >>>>>>>>>> official >>>>>>>>>> UIMA APIs, then moving to the internal.util package will fix the >>>>>>>>>> "enforcer" >>>>>>>>>> error the build is currently getting. >>>>>>>>>> >>>>>>>>>> -Marshall >>>>>>>>>>
