On 22 May 2017 at 18:38, Steve Dower <steve.do...@python.org> wrote: > Okay, I think I get the problem now. We expect backends to let child > subprocesses just spit out whatever *they* want onto the same stdout/stderr.
s/expect/allow/ The paranoid in me suspects "expect" is also true, though :-) > I'm really not a fan of forcing front ends to clean up that mess, and so I'd > still suggest that the backend "tool" be a script to launch the actual tool > and do the conversion to UTF-8. What you're referring to as the backend "tool" being a script, is what the PEP refers to as a "shim" (as Nick pointed out to me) and is considered part of the front end. The back end is a set of Python APIs which are called by the front end (in any real life front end, via the front end's shim script). > Perhaps the middle ground is to specify encoding='utf-8', errors='anything > but strict' for front-ends, and well-behaved backends should do the work to > transcode when it is known to be necessary for the tools they run. (i.e. > frontends do not crash, backends have a simple rule for avoiding loss of > data). For front ends, "never crash" is essential. But "produce as readable as possible data" is also a high priority. Consider for example a Russian user with a series of directories named in Russian. If the tools write an error using his local 8-bit encoding, and the front end assumes UTF-8, then all of the high-bit characters in his directory names would be replaced. Deciphering an error message like "File ???????/?????/?????.c: unexpected EOF" is problematic... :-( The model assumes that most front-ends would call the backend via a subprocess "shim" that was maintained by the front end project. But the expectation here seems to be that the backend is allowed to write directly to the stdio streams of its process (or at least, to let the tools it calls do so). So the shim *cannot* control the encoding of the data received by the frontend, and so the encoding has to be agreed between backend and frontend. The basic question is how the responsibility for dealing with data in an uncertain encoding is allocated. It seems to me there are 2 schools of thought: 1. There are likely to be fewer front ends than back ends, and so the front end(s) (basically, pip) should deal with the problem. Also, backends are more likely to be written by developers who are looking at very specific scenarios, and asking them to handle all the complexities of robust multilingual coding is raising the bar on writing a backend too high. 2. The backend is where the problem lies, and so the backend should address the issue. Furthermore, a well-established principle in dealing with encodings is to convert to strings right at the boundary of the application, and in this case the backend is the only code that has access to that boundary. (I tend towards (2), but I honestly can't say to what extent that's because it makes it "someone else's problem" for me ;-)) As you say, the middle ground here is that front ends must never crash, and back ends should (but aren't required to) produce output in a specified encoding (I still prefer the locale encoding as that has the best chance of avoiding the ????/???? issue). That's more or less what pip has to deal with now (and not that far off (1)), and my current attempt to address that situation is at https://github.com/pypa/pip/pull/4486 for what it's worth. A couple of final thoughts. I would expect that testing the handling of encodings is likely to be an important issue (at least, I expect there'll be bugs, and adding tests to make sure they get properly fixed will be important). Handling tool output encoding in the backend is likely to involve relatively low level interface functions, where the inputs and outputs can be relatively easily mocked. So I would expect backend unit testing of encoding handling would be relatively straightforward. Conversely, testing front end handling of encoding issues is very tricky - it's necessary to set up system state to persuade the build tools to produce the data you want to test against (it feels like integration testing rather than unit testing). Also, fixing encoding issues in the backend decouples the fix from pip's release cycle, which is probably a good thing (unless the backend is not well maintained, but that's an issue in itself). Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig