On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > Changing the APIs would be much work, although perhaps not impossible > > of Python 3000. For example, Raymond Hettinger's partition() API > > doesn't refer to indices at all, and can replace many uses of find() > > or index(). > > I think Neil's proposal is not to make them go away, but to implement > them less efficiently. For example, if the internal representation > is UTF-8, indexing requires linear time, as opposed to constant time. > If the internal representation is UTF-16, and you have a flag to > indicate whether there are any surrogates on the string, indexing > is constant if the flag is false, else linear.
I understand all that. My point is that it's a bad idea to offer an indexing operation that isn't O(1). > > Perhaps we could provide a different kind of API to support the > > latter, perhaps based on a mutable character buffer data type without > > direct indexing? > > There are different design goals conflicting here: > - some think: "all my data is ASCII, so I want to only use one > byte per character". > - others think: "all my data goes to the Windows API, so I want > to use 2 byte per character". > - yet others think: "I want all of Unicode, with proper, efficient > indexing, so I want four bytes per char". I doubt the last one though. Probably they really don't want efficient indexing, they want to perform higher-level operations that currently are only possible using efficient indexing or slicing. With the right API. perhaps they could work just as efficiently with an internal representation of UTF-8. > It's not so much a matter of API as a matter of internal > representation. The API doesn't have to change (except for the > very low-level C API that directly exposes Py_UNICODE*, perhaps). I think the API should reflect the representation *to some extend*, namely it shouldn't claim to have operations that are typically thought of as O(1) that can only be implemented as O(n). An internal representation of UTF-8 might make everyone happy except heavy Windows users; but it requires changes to the API so people won't be writing Python 2.x-style string slinging code. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com