Hi Sébastien, My advice is: don't over-engineer. I suggest to limit your design to three types:
- String (unicode string), - VirtualString (constructor for String), - ByteString (a raw sequence of bytes). Give up with the VirtualByteString. It looks like over-engineering to me. With the types above, you only need three fundamental operations: - ToString: VirtualString -> String, - Encode: String x Encoding -> ByteString, - Decode: ByteString x Encoding -> String. All the typical string operations should be on the type String. Use ByteString as a raw sequence of bytes with a minimal set of operations. This is not really a string type, but rather a storage type. I don't know how String should be represented. A list of unicode characters looks nice conceptually, but this is a very costly representation. Ideally, you would like to store a string as an atomic data structure, but it should "behave" like a list of characters. Maybe the unification of a string S with X|T could bind X to a (unicode) character and T to a proxy that represents S without its first character. I must confess that I find the object model of Python strings very attractive. A string is an object with an atomic storage, and it supports array-like operations (including iterators) when you need an explicit decomposition. And characters don't have a specific type: they are strings of length 1. It is simple, clean, pragmatic and efficient. Those were my two cents. Cheers, Raphael On Fri, Jul 6, 2012 at 4:07 PM, Sébastien Doeraene <sjrdoera...@gmail.com>wrote: > Hi everybody, > > In this mail, I present a new definition for VirtualString, and a new > concept for VirtualByteString. We would like your opinion on this > definition, for (experimental) implementation in Mozart2. > > As you all know, in Oz there is this concept of VirtualString, which > allows to build string-like things by concatenation, etc. > If you have followed the previous discussion on Strings on this mailing > list, we have decided to add (experimental) support for Unicode-enabled > APIs and data types in Mozart2. > > Kenny TM~ has done a great job at implementing a compact UnicodeString > data type, with many operations, among which the encoding/decoding to/from > ByteStrings according to several encoding (latin1, utf8, utf16 and utf32, > with BOM and LE/BE support where applicable). Note also that atoms are > fully Unicode capable in Mozart2. > > In Mozart 1.4.0, ByteString had the role of "compact string" > representation, for text was always latin1-encoded. UnicodeString should > take this role for text data from now on. ByteString is still useful, to > compactly store sequences of bytes (for I/O, e.g.). > > Interestingly, this opens a whole new possible view of what is a > VirtualString. Before this concept was sufficient to capture both text and > binary sequences. Now it is not true anymore. So, we would like to review > the definition of VirtualString, and introduce a new concept which is > VirtualByteString. We would define them as follows: > > <VirtualString, aka VS> ::= > UnicodeString > | Atom, except '#' and 'nil' > | list of UnicodeChar (data type not implemented yet, but will come) > | Int (implicitly converted to decimal notation) > | Float (idem) > | #-tuple of zero to many <VS>'es (concatenation) > | decode(<Encoding> <VBS>) > | list of integers (implicitly interpreted as latin1 encoded - for > compatibility) > | ByteString (implicitly interpreted as latin1 encoded - for > compatibility) > > <VirtualByteString, aka VBS> ::= > ByteString > | list of integers which are bytes > | #-tuple of zero to many <VBS>'es > | encode(<Encoding> <VS>) > > <Encoding> ::= spec of an encoding, possible format: list of {latin1, > utf8, utf16, utf32, littleEndian, bigEndian, bom} > > The mutually recursive definition of VS and VBS allows elaborate > constructions. > > Given these definitions, APIs that expect textual data will always accept > a VirtualString, and APIs that expect binary data (e.g., I/O) will accept a > VirtualByteString. > > What do you think of this approach to re-unifying textual virtual strings > and binary virtual strings? > > Cheers, > Sébastien > > > _________________________________________________________________________________ > mozart-hackers mailing list > mozart-hackers@mozart-oz.org > http://www.mozart-oz.org/mailman/listinfo/mozart-hackers >
_________________________________________________________________________________ mozart-hackers mailing list mozart-hackers@mozart-oz.org http://www.mozart-oz.org/mailman/listinfo/mozart-hackers