Reuben Thomas wrote: >I agree that this behaviour is undesirable. Unfortunately it's deep-seated. >I have done some work on it, but I don't yet have something I can release.
Oh cool, so I'm not the first person to notice, and there's already been some progress. Quick thought about a way this could be tackled: internally you could explicitly represent the input checking step distinct from a "mere copy" operation, you interpret "UTF-8..UTF-8" as a checking step, and then some checking steps can be optimised out of the operation sequence. Checking that the input conforms to particular charset immediately after conversion to that same charset can be optimised out, checking conformance to any 8-bit single-byte charset is null and can be optimised out, and there are some cases where checks for different charsets are equivalent. Further refinement of the above: in some cases there might be value in splitting a conversion step into a checking step followed by a non-checking conversion. The value here is that that checking step might then be able to be optimised out depending on the prior step of the pipeline. At a later stage of optimisation, maybe the checking step and non-checking conversion recombine into an ordinary checking conversion of the kind you already have. -zefram