On 20.07.2016, at 11:03, Peter Klügl <[email protected]> wrote: > > Ok, after looking at the code I must admit that there is much more to do > than I epxected. We first need to discuss several things: > > - can we change the header at all?
Afaik Marshall added a version field to the header, so it should be possible to change define a new version of the file format with an extended header. > - do we support type system inclusion in the header? Not sure what you mean by "in the header" vs "in the serialized files". > - do we support type system inclusion in the serialized files? With "serialized", do you mean the "Java serialized files" - or any of the binary files? I'm strongly in favor of allowing to have typesystem information embedded in the binary/serialized files. Having the type system separate is very useful to save space, but highly inconvenient when e.g. sending annotated documents around. There are regularly posts on the mailing list where people try to recover typesystem information from XMI files because they lost the original type system description. > - which serial format are which ones? Not sure what your question is since you already added the new constants - and the new ones make sense to me. I believe the format IDs used in DKPro Core map as follows: "S" -> SERILALIZED "S+" -> SERILALIZED_TS "0" -> BINARY, // no filtering "4" -> COMPRESSED, // no filtering (form 4) "6" -> COMPRESSED_FILTERED, // with reachability and type and feature filtering (form 6) "6+" -> COMPRESSED_FILTERED_TS // ~probably similar, not the same n/a -> COMPRESSED_PROJECTION, // with subset of views Cheers, -- Richard
