Last conversation about the dd spec on here, people suggested the spec was literally a practical joke. That said, posix published the darn thing, so let's look through it.
First, we will not be implementing the whole thing. It requires ascii/ebcdic mapping, which was dead 30 years ago. So the question is what _subset is worth implementing. bs= without data modifying conversions means you output what you input. (If you got a short read, you do a write of that size.) Otherwise, you collate input blocks into output blocks of the requested size. question: what happens if there's a short write? Do you collate to the next full output block size, or do you re-write the missing chunk as a short write? Question: what if bs= and obs= are both specified? (Answer: bs= wins.) sync is silly. swab is silly. But both are easy to do... Question: lcase and ucase are utf8 now, and any fixed block size is going to chop characters in the middle. It says conversions operate independent of input blocking, so I guess I need a minimum buffer size of 512 or so... (gotta look up what this block/unblock stuff is doing...) Ok, block or unblock do nothing unless you specify cbs= "conversion block size". Which is different from ibs=, obs=, and bs=. Right, I'm going to throw that in the "did not implement" pile and wait for somebody to complain. There's a bs=123x456 in the spec, we didn't previously implement that, I'm not adding it now because it's crazy and $((123*456)) exists. No if= default to stdin, no of= default to stdout. Got it. sigint causes progress indicator output, but I have a "todo" that says it's not ending the process... Question: if bs= _isn't_ specified (but nor is ibs= or obs=) I vaguely recall the default block size is 512 bytes. Is that considered bs= being specified in terms of the "write what you wrote" behavior, or do we fill up 512 byte output blocks if we read less than that? (This matters if you dd from /dev/ttyS0 and get bytes typed by humans.) If your output block is a short write, do you retry the rest of tha that block or do a whole next block? of= is truncated by default, to seek= position if that's specified. Disabled with conv=notrunc. Edge case: If you specify ibs=prime1 obs=prime2 then the smallest internal buffer you can have without memcpy is ibs*obs... except even _that_ won't work if you have a short read that screws up the alignment and If you're willing to do memcpy to preserve block size, then you just need ibs+obs as your worst case, and can memcpy to realign after each block if necessary. Three potential output formats: "%u+%u records in\n", <number of whole input blocks>, <number of partial input blocks> "%u+%u records out\n", <number of whole output blocks>, <number of partial output blocks> "%u truncated %s\n", <number of truncated blocks>, "record" (if <number of truncated blocks> is one) "records" (otherwise) Great, and now THIS nonsense in the rationale: > Another point is that a failed read on a regular file or a disk > generally does not increment the file offset, and dd must then > seek past the block on which the error occurred; otherwise, the > input error occurs repetitively. When the input is a magnetic > tape, however, the tape normally has passed the block containing > the error when the error is reported, and thus no seek is necessary. So... try to seek after an error but ignore failure of lseek()? The bit about writing a partial block after an error without noerror... I guess that's from obs being larger than ibs? (Because read() either returns data _or_ error, not both...) Sigh. What a mess. Next up, a similarly close reading of the man page... Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net