> If you implement csv in sort you’ll have to implement it in head, tail, uniq, joint, wc, etc. etc. etc...
Could the format processing logic be extracted? Also maybe that's a place for some kind of abstractions like format processor, unquoted format processor, etc? On Sun 31. Jan 2021 at 0.59, Erik Auerswald <[email protected]> wrote: > Hi, > > On 30.01.21 21:28, Eric Fischer wrote: > > A couple of years ago I went down this route of thinking I would add CSV > > support to sort, and then let myself get distracted into trying to follow > > > https://paulfitz.github.io/2017/01/24/the-year-of-poop-on-the-desktop.html > > Well, but not everyone is using PSV format, many are using some > kind of CSV format. I sometimes use CSV (or SSV, semicolon > separated values ;) as a simple compatibility format when working > with people not using the GNU operating system. > > Even with ASCII there are seldom used characters that look helpful > for character separated value files, e.g., "Unit Separator" (0x1f), > to practically get rid of the need for quoted fields. > > But since not everybody uses those characters already, a tool that > bridges the worlds of RFC 4180 CSV(*) and GNU Coreutils might be > handy. > > Seldom used ASCII (i.e., single byte) characters could be used as > field separator to enable working with GNU tools, even if this is > just used in a pipeline, but never seen by the user: > > csvconv -f, -t$'x1f' data.csv | sort -t$'\x1f' | csvconv -f$'\x1f' -t, > > (This uses an imaginary CSV tool "csvconv" to convert from (-f) one > separator to (-t) another while observing CSV quoting rules.) > > Disclaimer: I did not check if sort works correctly with "-t$'\x1f'". > > To allow newlines inside a field one could terminate each row of CSV > data with NUL, and use "sort -z". Thus the imaginary csvconv could > use "--input-zero-terminated" and "--output-zero-terminated" options > as well. > > The imaginary "csvconv"'s adherence to (generalized) CSV quoting > rules would be the primary difference to "tr", "sed", or "awk". > > Thanks, > Erik > > (*) RFC 4180 requires CRLF instead of LF as end-of-line sequence, but > many implementations just use the native end-of-line sequence. > > -- Thanks! Best regards, Grigorii
