Date: Thu, 30 Jul 2020 15:53:53 +0200 From: Steffen Nurpmeso <stef...@sdaoden.eu> Message-ID: <20200730135353.qwslp%stef...@sdaoden.eu>
| The problem being that what is in the wild does not work out for | many languages. I admit to not knowing a lot of the internationalisation issues, or of unicode, but I don't understand this at all. The quoting mechanisms in the shell provide a means to create specific bit patterns to assign to variables, pass as parameters to programs, etc. I don't see that the mechanism by which they're encoded in the sh language should matter all that much, the same thing could be read from a file instead ( var=$(cat file) ) in which case the shell spec has no control over the bit patterns at all. Of course the quoting mechanisms make a difference to the ease of use for the sh programmer, but that's an entirely different issue. | The in-use shell quote pattern consisting of small, isolated parts | which depend on which kind of escaping and expanding is necessary | just does not work out for many languages. Can you give an example of something which cannot be done (assuming $'' as currently intended to be specified)? Note: not an example of someone using the mechanisms to do the wrong thing - there are zillions of ways to write bad code, but an example of something which cannot be done correctly as specified. Then we'll see if that really matters. | ? echo Don"'"t you worry$'\x21' The sun shines on us. $'\u263A' | | The latter is what i mean. There are many languages on this world | where these \u expansions do not work out that way, but where the | "entire sentence must be interpreted as a unity" in order to get | the iconv(3) conversation to nl_langinfo(CODESET) correctly, aka | the way it is _desired_. Surely this depends upon how the shell works - if the shell is attempting to convert just the \u escape into some other codeset, I can see your point, but it doesn't need to work like that - it can work internally in 10646 code points (whether encoded in 16 or 32 bit values, or as UTF-8), and only convert to the desired charset when actually used (that is, when about to run "echo" at which point the entire string is available. In any case, if the user has specified a specific unicode code point, shouldn't that always be what is generated, regardless of whether it makes sense or not? | And for that it would be tremendous if $'' would be defined so | that it can be used as the sole quoting mechanism, No thanks. Partly because $'' is already implemented (widely) and used (perhaps slightly less yet) - so that ship has sailed. I believe I've seen $" ... " used that way somewhere though (don't recall where) and I believe it is a mistake. As soon as you have multiple different types of expansions that can occur, there are problems with which one gets priority, which is performed first. So, assuming there is a $"..." which works as you desire, what happens with $"${VAR+foo\x7Dbar}" Do we get foo}bar or foobar} ? (assuming VAR was set of course). Whichever way you pick, there will be arguments for doing it the other way, in some other case. This stuff simply becomes a mess. Please, don't go there. If we wanted to add C type encodings along with the others, we'd need to do it in a way that is consistent with the other expansions, perhaps using something like $[x7D] or $[u263A] or $[n] (but no, this is not a serious suggestion). And I cannot fathom how this in any way overcomes your earlier objection, quoted strings in sh are not units, they're simply pieces of some longer word (or can be) - your Don"'"t example above (and the worry$'\x21') are both examples of that. kre