Hi, On 30.01.21 21:28, Eric Fischer wrote:
A couple of years ago I went down this route of thinking I would add CSV support to sort, and then let myself get distracted into trying to follow https://paulfitz.github.io/2017/01/24/the-year-of-poop-on-the-desktop.html
Well, but not everyone is using PSV format, many are using some kind of CSV format. I sometimes use CSV (or SSV, semicolon separated values ;) as a simple compatibility format when working with people not using the GNU operating system. Even with ASCII there are seldom used characters that look helpful for character separated value files, e.g., "Unit Separator" (0x1f), to practically get rid of the need for quoted fields. But since not everybody uses those characters already, a tool that bridges the worlds of RFC 4180 CSV(*) and GNU Coreutils might be handy. Seldom used ASCII (i.e., single byte) characters could be used as field separator to enable working with GNU tools, even if this is just used in a pipeline, but never seen by the user: csvconv -f, -t$'x1f' data.csv | sort -t$'\x1f' | csvconv -f$'\x1f' -t, (This uses an imaginary CSV tool "csvconv" to convert from (-f) one separator to (-t) another while observing CSV quoting rules.) Disclaimer: I did not check if sort works correctly with "-t$'\x1f'". To allow newlines inside a field one could terminate each row of CSV data with NUL, and use "sort -z". Thus the imaginary csvconv could use "--input-zero-terminated" and "--output-zero-terminated" options as well. The imaginary "csvconv"'s adherence to (generalized) CSV quoting rules would be the primary difference to "tr", "sed", or "awk". Thanks, Erik (*) RFC 4180 requires CRLF instead of LF as end-of-line sequence, but many implementations just use the native end-of-line sequence.