Hi,

I'll try to find the time to do these changes this week, next week latest.


btw, input stream sniffing in order to distinguish XMI and XCAS is
currently not supported. There could be a lot of text before the
relevant element occurs, e.g., license text.


Best,


Peter


Am 20.07.2016 um 14:19 schrieb Marshall Schor:
> Hi,
>
> We can change the header, but:
>
> The changed header ought to be "readable" by previous versions of UIMA.  
>
> For XMI and XCAS, these do not currently have special headers, and if we added
> these, those formats could not be read by older versions of UIMA.  Those 
> formats
> contain sufficient distinguishing initial strings to distinguish them, 
> though. 
>
> The XMI format is specified, also, in an OASIS standard which the UIMA project
> is said to (mostly) follow: http://uima.apache.org/uima-specification.html
>
> For binary serializations, I think there's room in the header for an extra 
> bit,
> which if on, could indicate that a type system was included.  I think it would
> be good to have a header extension, when type systems are included, to specify
> the format and version of the type system serialization.
>
> Most serializations in core UIMA have not included the type system.  The one
> which does is CASCompleteSerializer.  This is  a "serializable" (using 
> standard
> Java serializations) object containing serializable forms of the CAS and Type
> System.
>
> Regarding making methods in CommonSerDes public:
>
> It is fine to make them public in the sense that they are accessible from 
> other
> packages, not in a sub-type hierarchy.  But I think it is best to not include
> CommonSerDes in a package which is intended for end-users, because the end 
> user
> UIMA APIs should be (as much as possible) stable over a long time period. 
> Details of how we evolve headers, etc., should not disturb end users, if
> possible; keeping these as public but in packages with names like xxx.impl or
> xyz.internal.abc etc. is the way this has been traditionally done.  It allows 
> us
> to evolve these without affecting end-user APIs.  
>
> Just to be clear: I would not consider uimaFIT and Ruta to be "end-users", as
> they are developed within the UIMA project, and we are willing to evolve them
> together with UIMA core changes.
>
> We don't have a deadline for the next release, but it's mostly ready to go, 
> and
> will solve a significant issue for people wanting to upgrade their Eclipse to
> Neon :-). 
>
> -Marshall
>
> On 7/20/2016 5:03 AM, Peter Klügl wrote:
>> Ok, after looking at the code I must admit that there is much more to do
>> than I epxected. We first need to discuss several things:
>>
>> - can we change the header at all?
>>
>> - do we support type system inclusion in the header?
>>
>> - do we support type system inclusion in the serialized files?
>>
>> - which serial format are which ones?
>>
>> - can we make the methods in CommonSerDes public?
>>
>>
>> What is the deadline for the release? I am now quite loaded with work
>> until next Wednesday :-(
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 19.07.2016 um 22:39 schrieb Marshall Schor:
>>> Great.
>>>
>>> There's now also common code for writing / reading UIMA serialization 
>>> headers, in
>>>
>>> CommonSerDes (in org.apache.uima.cas.impl )
>>>
>>> This includes the extensions to support versioning the serializations, which
>>> start to be needed in the next release because a bug fix is slightly 
>>> changing
>>> the serialized form for **delta binary** CAS.
>>>
>>> So, it would be good to use that rather than have another separate header
>>> reader/writer to maintain.
>>>
>>> -Marshall
>>>
>>>
>>> On 7/19/2016 4:13 PM, Peter Klügl wrote:
>>>> Ah, I didn't know that enum. I'll adapt the code and enum.
>>>>
>>>> Am 19.07.2016 um 20:09 schrieb Marshall Schor:
>>>>> We already have an enum in the core for various serial formats.  The 
>>>>> class is
>>>>>
>>>>> public enum SerialFormat {
>>>>>    UNKNOWN,
>>>>>    XCAS,         // with reachability filtering
>>>>>    XMI,          // with reachability filtering
>>>>>    BINARY,       // no filtering
>>>>>    COMPRESSED,   // no filtering  (form 4)
>>>>>    COMPRESSED_FILTERED,   // with reachability and type and feature 
>>>>> filtering
>>>>> (form 6)
>>>>>    COMPRESSED_PROJECTION, // with subset of views
>>>>> }
>>>>>
>>>>> (I don't think COMPRESSED_PROJECTION is in use...)
>>>>>
>>>>> This has been around for maybe 3 years.  I would be in favor of 
>>>>> considering
>>>>> using and/or extending this as needed, rather than having two formats 
>>>>> (that is,
>>>>> the proposed SerializationFormat class).
>>>>>
>>>>> -Marshall
>>>>>
>>>>> On 7/19/2016 2:49 AM, Peter Klügl wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> yes, the class should be officially available to external code. I
>>>>>> already included it in the CAS Editor and in Ruta. I also plan to use it
>>>>>> in our inhouse code. I'll change the enforcer rule.
>>>>>>
>>>>>>
>>>>>> I can write the docs but any help is welcome since I do not know how
>>>>>> much spare time I have for the rest of the week for this. I'll take a
>>>>>> look where the documentation should be added. Haven't looked to it for
>>>>>> some time ;-)
>>>>>>
>>>>>>
>>>>>> I just chose the name of the class Richard contributed since I thought
>>>>>> it is really suitable. Then, I also noticed the uimaFIT class. This is a
>>>>>> not really good situation, but I would not change the name because of it.
>>>>>>
>>>>>>
>>>>>> I would not split the API form the implementation. I do not see any
>>>>>> advantages right now. The class is just a simple utils class with only
>>>>>> static methods like CasCreationUtils (which is also not separated).
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Am 18.07.2016 um 22:26 schrieb Marshall Schor:
>>>>>>> This is OK with me.  I can even volunteer to write the docs (but am 
>>>>>>> happy to
>>>>>>> others do it :-) ).
>>>>>>>
>>>>>>> I'll wait to hear about the split (if any) between the public API and 
>>>>>>> the
>>>>>>> impl.
>>>>>>>
>>>>>>> And, we'll need to change the next version # to 2.9.0, from 2.8.2, due 
>>>>>>> to this
>>>>>>> being that kind of a change.
>>>>>>>
>>>>>>> Is everyone OK with all of this?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>> On 7/18/2016 2:39 PM, Richard Eckart de Castilho wrote:
>>>>>>>> I believe the intention is that this class becomes part of the public 
>>>>>>>> API.
>>>>>>>>
>>>>>>>> Also, my understanding is that it would do a superset of what the
>>>>>>>> uimaFIT class by the same name does. We could then probably deprecate
>>>>>>>> the respective uimaFIT class and suggest using the core class instead.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> -- Richard
>>>>>>>>
>>>>>>>>> On 18.07.2016, at 20:30, Marshall Schor <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> This is a new class added to uimaj-core project, in 
>>>>>>>>> org.apache.uima.util
>>>>>>>>> package.  This is fine if this is to be part of the official public 
>>>>>>>>> APIs
>>>>>>>>> supported by UIMA going forward; but if that is the case, it should
>>>>>>>>> probably be
>>>>>>>>> documented in the UIMA docs, and we'd have to change the version 
>>>>>>>>> number
>>>>>>>>> (due to
>>>>>>>>> enforcer rules).
>>>>>>>>>
>>>>>>>>> If this is more of an internal use utilities, then it should be in 
>>>>>>>>> one of
>>>>>>>>> the
>>>>>>>>> internal use packages, such as
>>>>>>>>>
>>>>>>>>>    org.apache.uima.internal.util
>>>>>>>>>
>>>>>>>>> This class is similarly named to a UIMAFit class; are these related?
>>>>>>>>>
>>>>>>>>> If some of the APIs are to be permanent and public and part of the 
>>>>>>>>> official
>>>>>>>>> public APIs, but some are internal implementation details, please
>>>>>>>>> consider using
>>>>>>>>> an interface and an ".impl" (or equivalent) approach; packages which 
>>>>>>>>> support
>>>>>>>>> these are:
>>>>>>>>>
>>>>>>>>>    org.apache.uima.util  and
>>>>>>>>>
>>>>>>>>>    org.apache.uima.util.impl
>>>>>>>>>
>>>>>>>>> --------------
>>>>>>>>>
>>>>>>>>> If this is only an internal kind of change, not intending to affect 
>>>>>>>>> the
>>>>>>>>> official
>>>>>>>>> UIMA APIs, then moving to the internal.util package will fix the 
>>>>>>>>> "enforcer"
>>>>>>>>> error the build is currently getting.
>>>>>>>>>
>>>>>>>>> -Marshall
>>>>>>>>>

Reply via email to