On Nov 25, 2013, at 1:05 PM, Jonathan Rochkind wrote:

> Ah, but what if the data itself has tabs!  Doh!
> 
> It can be a mess either way.  There are standards (or conventions?) for 
> escaping internal commas in CSV -- which doesn't mean the software that was 
> used to produce the CSV, or the software you are using to read it, actually 
> respects them.

You don't have to escape the commas, you just have to double-quote the string.  
If you want to have a double quote, you put two in a row:, eg:

        "He said, ""hello"""


> But I'm not sure if there are even standards/conventions for escaping tabs in 
> a tab-delimited text file?

None official ones that I'm aware of.  I've seen some parsers that will 
consider a backslash before a delimiter to be an escape, but I don't know if 
there's an official spec for tab- / pipe- / whatever-delimited text.



> Really, the lesson to me is that you should always consider use an existing 
> well-tested library for both reading and writing these files, whether CSV or 
> tab-delimited -- even if you think "Oh, it's so simple, why bother than 
> that."  There will be edge cases. That you will discover only when they cause 
> bugs, possibly after somewhat painful debugging. A well-used third-party 
> library is less likely to have such edge case bugs.

Agreed, but in this case, it might be easier to bypass the library.  (if you 
were using a library, you'd have to shift an empty element to the front of each 
row, then output it).


> I am more ruby than python; in ruby there is a library for reading and 
> writing CSV in the stdlib. 
> http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html

And I'm more perl, and generally lazy for this simple of an edit:

        perl -pi -e 's/^/\t/' file_to_convert

(the '-p' tells it to apply the transformation to each line, '-i.bak' tells it 
to save the file with '.bak' appended before processing, "-e 's/^/\t/'" is to 
put a tab at the front of the line)

-Joe

Reply via email to