On Nov 25, 2013, at 1:05 PM, Jonathan Rochkind wrote:
> Ah, but what if the data itself has tabs! Doh!
>
> It can be a mess either way. There are standards (or conventions?) for
> escaping internal commas in CSV -- which doesn't mean the software that was
> used to produce the CSV, or the software you are using to read it, actually
> respects them.
You don't have to escape the commas, you just have to double-quote the string.
If you want to have a double quote, you put two in a row:, eg:
"He said, ""hello"""
> But I'm not sure if there are even standards/conventions for escaping tabs in
> a tab-delimited text file?
None official ones that I'm aware of. I've seen some parsers that will
consider a backslash before a delimiter to be an escape, but I don't know if
there's an official spec for tab- / pipe- / whatever-delimited text.
> Really, the lesson to me is that you should always consider use an existing
> well-tested library for both reading and writing these files, whether CSV or
> tab-delimited -- even if you think "Oh, it's so simple, why bother than
> that." There will be edge cases. That you will discover only when they cause
> bugs, possibly after somewhat painful debugging. A well-used third-party
> library is less likely to have such edge case bugs.
Agreed, but in this case, it might be easier to bypass the library. (if you
were using a library, you'd have to shift an empty element to the front of each
row, then output it).
> I am more ruby than python; in ruby there is a library for reading and
> writing CSV in the stdlib.
> http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
And I'm more perl, and generally lazy for this simple of an edit:
perl -pi -e 's/^/\t/' file_to_convert
(the '-p' tells it to apply the transformation to each line, '-i.bak' tells it
to save the file with '.bak' appended before processing, "-e 's/^/\t/'" is to
put a tab at the front of the line)
-Joe