Digging into this a bit further --

POSIX defines a "print" class, which I believe is an exact fit. The Unicode spec doesn't define this class, which I presume is why D's std.uni library also omits it. But there is an isprint() function in libc, which I should be able to use (POSIX here). This function refers to the system locale, so it isn't limited to ASCII characters (unlike std.ascii:isPrintable).

So that's one down, two to go:

  Loop until newline or EOF
   (1) Read bytes or character             } Possibly
   (2) Decode UTF-8, exception if invalid  } together
   (3) Call isprint(), exception if invalid
  Return line

(This simplified outline obviously doesn't show how to deal with the complications arising from using buffers, handling codepoints that straddle the end of the buffer, etc.)

Where I'm still stuck is the read or read-and-auto-decode: this is where the waters get really muddy for me. Three different techniques for reading characters are suggested in this thread (iopipe, ranges, rawRead): https://forum.dlang.org/thread/cgteipqqfxejngtpg...@forum.dlang.org

I'd like to stick with standard D or C libraries initially, so that rules out iopipe for now. What would really help is some details about what one read technique does particularly well vs. another. And is there a technique that seems more suited to this use case than the rest?

Thanks again

Reply via email to