> From: Marko Rauhamaa <[email protected]> > Cc: [email protected] > Date: Thu, 16 Feb 2017 18:35:48 +0200 > > Eli Zaretskii <[email protected]>: > > > You assume that Emacs concatenates strings by just splicing its bytes. > > But that's a far cry from what Emacs does, precisely to countermand > > such problems. > > Good to hear. If Guile is to adopt a similar approach, it should pay > attention to these details as well.
Indeed. > > The important point for Guile is that the solution is there, in Free > > Software, documented well enough, and people who understand the > > implementation and can explain its subtleties are still here, ready to > > help. All it takes is for Guile to decide it wants to implement > > something similar. > > It would be important for Guile to be a sufficient basis for emacs. That's not my point. My point is that the Emacs model, or some minor variant thereof, should be a good model for Guile (or any other environment that seeks to support complex multi-lingual applications), _regardless_ of whether Guile will ever become the core of the Emacs Lisp interpreter. IOW, it's good for Guile itself. > On the other hand, emacs' needs might be far too high for any simple > string type. For example, Guile might treat strings as simple > sequences of code points while emacs might impose some Unicode > normalization requirements or vice versa. > > For example, what should > > (string= "Åström" "Åström") > > return? > > Emacs 25.1 doesn't see the strings as equal. As it should, IMO. Testing strings for equivalence under canonical or compatibility decompositions is not the job of string=, it requires a separate API. (Emacs provides in ucs-normalize.el the functionality required for that.) There are situations where you want the former, and others where you want the latter. That's why Unicode normalization is not implemented in Emacs on the same level as the string data type, and the application needs to explicitly request normalization in order for it to happen. In general, string equivalence is in many use cases an application-level feature (think interactive text searching), and needs to be language- and locale-sensitive to satisfy users (e.g., it turns out users of Spanish locales don't consider "ñ" (one character), to be equivalent to "ñ" (two characters)).
