[
https://issues.apache.org/jira/browse/SANDBOX-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607011#action_12607011
]
Sebb commented on SANDBOX-242:
------------------------------
Certainly looks odd. What is the context - i.e. where is it used?
Also, how are double-quotes within the data handled?
If there are none, you might be able to pre-process the file by replacing all
occurrences of 2 double-quotes by 1 double-quote; it will then look more like
standard CSV:
","AD","ALV","Andorra la Vella","Andorra la
Vella",,"-34-6-","AI","0601",,"4230N 00131E","""
That is apart from the ends of the line. If these are always as shown here,
then just remove those as well:
,"AD","ALV","Andorra la Vella","Andorra la Vella",,"-34-6-","AI","0601",,"4230N
00131E",""
before processing as CSV.
It is still a bit odd, because the final empty field is enclosed in quotes,
where as other empty fields are not.
However, it should still parse OK.
Another possibility: if the CSV parser supports multi-character delimiters,
then just define the delimiter as two double-quotes.
But you would still need to remove the enclosing quotes first.
> Encapsulated csv line strategy
> ------------------------------
>
> Key: SANDBOX-242
> URL: https://issues.apache.org/jira/browse/SANDBOX-242
> Project: Commons Sandbox
> Issue Type: Improvement
> Components: CSV
> Reporter: marc schipperheyn
> Priority: Minor
>
> I ran into this csv file :
> ",""AD"",""ALV"",""Andorra la Vella"",""Andorra la
> Vella"",,""--34-6--"",""AI"",""0601"",,""4230N 00131E"","""""
> It basically and weirdly encapsulates an entire line in " as a sort of record
> identifier and has double "" to identify values
> I'm having difficulty finding a tool that can parse this easily.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.