[ 
https://issues.apache.org/jira/browse/SANDBOX-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607011#action_12607011
 ] 

Sebb commented on SANDBOX-242:
------------------------------

Certainly looks odd. What is the context - i.e. where is it used?

Also, how are double-quotes within the data handled?
If there are none, you might be able to pre-process the file by replacing all 
occurrences of 2 double-quotes by 1 double-quote; it will then look more like 
standard CSV:

","AD","ALV","Andorra la Vella","Andorra la 
Vella",,"-34-6-","AI","0601",,"4230N 00131E","""

That is apart from the ends of the line. If these are always as shown here, 
then just remove those as well:

,"AD","ALV","Andorra la Vella","Andorra la Vella",,"-34-6-","AI","0601",,"4230N 
00131E",""

before processing as CSV.

It is still a bit odd, because the final empty field is enclosed in quotes, 
where as other empty fields are not.
However, it should still parse OK.

Another possibility: if the CSV parser supports multi-character delimiters, 
then just define the delimiter as two double-quotes.
But you would still need to remove the enclosing quotes first.


> Encapsulated csv line strategy
> ------------------------------
>
>                 Key: SANDBOX-242
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-242
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: CSV
>            Reporter: marc schipperheyn
>            Priority: Minor
>
> I ran into this csv file :
> ",""AD"",""ALV"",""Andorra la Vella"",""Andorra la 
> Vella"",,""--34-6--"",""AI"",""0601"",,""4230N 00131E"","""""
> It basically and weirdly encapsulates an entire line in " as a sort of record 
> identifier and has double "" to identify values
> I'm having difficulty finding a tool that can parse this easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to