Hi Sébastien,

My advice is: don't over-engineer.  I suggest to limit your design to three
types:

   - String (unicode string),
   - VirtualString (constructor for String),
   - ByteString (a raw sequence of bytes).

Give up with the VirtualByteString.  It looks like over-engineering to me.
 With the types above, you only need three fundamental operations:

   - ToString: VirtualString -> String,
   - Encode: String x Encoding -> ByteString,
   - Decode: ByteString x Encoding -> String.

All the typical string operations should be on the type String.  Use
ByteString as a raw sequence of bytes with a minimal set of operations.
 This is not really a string type, but rather a storage type.

I don't know how String should be represented.  A list of unicode
characters looks nice conceptually, but this is a very costly
representation.  Ideally, you would like to store a string as an atomic
data structure, but it should "behave" like a list of characters.  Maybe
the unification of a string S with X|T could bind X to a (unicode)
character and T to a proxy that represents S without its first character.

I must confess that I find the object model of Python strings very
attractive.  A string is an object with an atomic storage, and it supports
array-like operations (including iterators) when you need an explicit
decomposition.  And characters don't have a specific type: they are strings
of length 1.  It is simple, clean, pragmatic and efficient.

Those were my two cents.

Cheers,
Raphael

On Fri, Jul 6, 2012 at 4:07 PM, Sébastien Doeraene <sjrdoera...@gmail.com>wrote:

> Hi everybody,
>
> In this mail, I present a new definition for VirtualString, and a new
> concept for VirtualByteString. We would like your opinion on this
> definition, for (experimental) implementation in Mozart2.
>
> As you all know, in Oz there is this concept of VirtualString, which
> allows to build string-like things by concatenation, etc.
> If you have followed the previous discussion on Strings on this mailing
> list, we have decided to add (experimental) support for Unicode-enabled
> APIs and data types in Mozart2.
>
> Kenny TM~ has done a great job at implementing a compact UnicodeString
> data type, with many operations, among which the encoding/decoding to/from
> ByteStrings according to several encoding (latin1, utf8, utf16 and utf32,
> with BOM and LE/BE support where applicable). Note also that atoms are
> fully Unicode capable in Mozart2.
>
> In Mozart 1.4.0, ByteString had the role of "compact string"
> representation, for text was always latin1-encoded. UnicodeString should
> take this role for text data from now on. ByteString is still useful, to
> compactly store sequences of bytes (for I/O, e.g.).
>
> Interestingly, this opens a whole new possible view of what is a
> VirtualString. Before this concept was sufficient to capture both text and
> binary sequences. Now it is not true anymore. So, we would like to review
> the definition of VirtualString, and introduce a new concept which is
> VirtualByteString. We would define them as follows:
>
> <VirtualString, aka VS> ::=
>      UnicodeString
>    | Atom, except '#' and 'nil'
>    | list of UnicodeChar (data type not implemented yet, but will come)
>    | Int (implicitly converted to decimal notation)
>    | Float (idem)
>    | #-tuple of zero to many <VS>'es (concatenation)
>    | decode(<Encoding> <VBS>)
>    | list of integers (implicitly interpreted as latin1 encoded - for
> compatibility)
>    | ByteString (implicitly interpreted as latin1 encoded - for
> compatibility)
>
> <VirtualByteString, aka VBS> ::=
>      ByteString
>    | list of integers which are bytes
>    | #-tuple of zero to many <VBS>'es
>    | encode(<Encoding> <VS>)
>
> <Encoding> ::= spec of an encoding, possible format: list of {latin1,
> utf8, utf16, utf32, littleEndian, bigEndian, bom}
>
> The mutually recursive definition of VS and VBS allows elaborate
> constructions.
>
> Given these definitions, APIs that expect textual data will always accept
> a VirtualString, and APIs that expect binary data (e.g., I/O) will accept a
> VirtualByteString.
>
> What do you think of this approach to re-unifying textual virtual strings
> and binary virtual strings?
>
> Cheers,
> Sébastien
>
>
> _________________________________________________________________________________
> mozart-hackers mailing list
> mozart-hackers@mozart-oz.org
> http://www.mozart-oz.org/mailman/listinfo/mozart-hackers
>
_________________________________________________________________________________
mozart-hackers mailing list                           
mozart-hackers@mozart-oz.org      
http://www.mozart-oz.org/mailman/listinfo/mozart-hackers

Reply via email to