On 16 March 2011 09:06, Jonathan S. Shapiro <[email protected]> wrote: > So I'm looking at string encoding issues again, and concluding that it's > just as icky as it was the last time I looked. I've looked at Python, and I > do think they did right by declaring that I/O happens in units of bytes with > conversion occurring at a layer above the I/O layer.
That is the intention, but don't look too closely: python 3 ignores that interface in several places, such as the population of sys.argv (which it happily pretends is a list of text strings). > Separately, I've > concluded (reluctantly) that we really do need constant-time string > indexing, and that I've been a dolt about that. A month or two after convincing me that there is no need for an application programming language to provide constant-time string indexing, you've changed your mind too? What prompted that? > One approach would be to introduce an opaque reference type NativeString, > and a set of runtime operations that will produce NativeString from String > (and the other way as well), and possibly NativeString from byte[]. The > reason to make NativeString strictly opaque is error-prevention. If we > support indexing operations on NativeString, we invite people to write code > that assumes a particular encoding of NativeString, and that code will run > incorrectly (or worse: appear to run correctly) on other platforms. > > The alternative is to introduce distinguished string types for the commonly > deployed native string representations: JavaString/JavaCodeUnit and > CliString/CliCodeUnit. This preserves the ability to write high-performance > code for a particular target environment without abandoning error diagnosis > when the code is ported. [It might be better to choose names that describe > the encodings; that's a separate issue.] I resist this approach at the > moment partly because I fear a proliferation of representation-oriented > types and partly because the semantics of strings in both runtime systems > seems hopelessly boogered. > > I'm inclined to favor the NativeString approach here, but I'm open to input. > Does somebody (anybody!) see a cleaner way out here? Why not provide each interface? Portable code can use NativeString for FFI, platform-specific optimisations can use the [Cli|Jvm|UTF32]String implementation, VM implementors can implement their own representation via byte[], and everybody is happy. What problem are you trying to solve? -- William Leslie _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
