Re: [bitc-dev] String encoding, again

William ML Leslie Tue, 15 Mar 2011 17:28:11 -0700

On 16 March 2011 09:06, Jonathan S. Shapiro <[email protected]> wrote:
> So I'm looking at string encoding issues again, and concluding that it's
> just as icky as it was the last time I looked. I've looked at Python, and I
> do think they did right by declaring that I/O happens in units of bytes with
> conversion occurring at a layer above the I/O layer.


That is the intention, but don't look too closely: python 3 ignores
that interface in several places, such as the population of sys.argv
(which it happily pretends is a list of text strings).

> Separately, I've
> concluded (reluctantly) that we really do need constant-time string
> indexing, and that I've been a dolt about that.

A month or two after convincing me that there is no need for an
application programming language to provide constant-time string
indexing, you've changed your mind too?  What prompted that?

> One approach would be to introduce an opaque reference type NativeString,
> and a set of runtime operations that will produce NativeString from String
> (and the other way as well), and possibly NativeString from byte[]. The
> reason to make NativeString strictly opaque is error-prevention. If we
> support indexing operations on NativeString, we invite people to write code
> that assumes a particular encoding of NativeString, and that code will run
> incorrectly (or worse: appear to run correctly) on other platforms.
>
> The alternative is to introduce distinguished string types for the commonly
> deployed native string representations: JavaString/JavaCodeUnit and
> CliString/CliCodeUnit. This preserves the ability to write high-performance
> code for a particular target environment without abandoning error diagnosis
> when the code is ported. [It might be better to choose names that describe
> the encodings; that's a separate issue.]  I resist this approach at the
> moment partly because I fear a proliferation of representation-oriented
> types and partly because the semantics of strings in both runtime systems
> seems hopelessly boogered.
>
> I'm inclined to favor the NativeString approach here, but I'm open to input.
> Does somebody (anybody!) see a cleaner way out here?

Why not provide each interface?  Portable code can use NativeString
for FFI, platform-specific optimisations can use the
[Cli|Jvm|UTF32]String implementation, VM implementors can implement
their own representation via byte[], and everybody is happy.  What
problem are you trying to solve?

-- 
William Leslie

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] String encoding, again

Reply via email to