Just this week, I had to correct a routine for a customer of mine that
outputs
a line of character data in CSV format. The routine had some errors and
flaws. I would like to tell you what I did and discuss some of the
properties of the CSV format.
First: the routine gets as input a number (number of items) and
an array of char pointers that point to (null terminated) char data
which have to be written to the output file in CSV format. All data
should be char data; the numeric values have been converted to
char before the call to that routine. Another parameter to the routine
is the separation char, which is semicolon in our case (could well be
comma or tab, no problem at all).
Second: for CSV, it is ok to surround every field with quotes. ExCel etc.
works perfectly with this and omits the quotes while reading the CSV.
This is very good, because this way you don't have problems if the
separation char appears inside the value fields. So this is what I did.
Another goodie: if the content of the field is numeric, ExCel does sometimes
strange things; it tries to interpret the value and converts for example
currency values
to date fields etc., which is plain nonsense. You get around this if you
enclose all your fields in quotes. Of course, if would not be necessary
for char fields that are not numeric. But we do it for all fields, that
makes
the logic simpler.
Third: the only thing you have to consider: if a qoute appears inside
your value fields, you have to double it.
Forth: to reduce the size of the CSV file, I omit trailing blanks in the
char fields, that is: in the loop that puts the chars into the output
buffer,
the position of the last non-blank char is stored in a temp variable,
and the closing quote is put just behind that position, when the end
of the output string is reached (implicit right trim of the output string
at almost no cost). This reduced the output files to ca. 2/3 of the
original size, without any effect on the apperance of the result in ExCel.
I guess this is Turing-complete, BTW.
<ad>
This is exactly the same logic I implemented in the CSV output (and input)
routine of my DB2 and Oracle ETL tool that I mentioned in some prior posts.
Feel free to contact me offline, if you want to know more about this.
</ad>
To the List admins: I decided not to put Ad: in the subject, because
this post in my opinion was a technical post for the most part. It is
sometimes dfficult to separate pure advertising from technical
information, how something is or should be done.
I hope you can accept this.
Kind regards
Bernd
Am 26.02.2015 um 18:51 schrieb Paul Gilmartin:
On Thu, 26 Feb 2015 08:20:49 -0800, Sri h Kolusu wrote:
Tony suggested the use of Tab (X'05') as delimiter which will avoid the
problem of data already have the common delimiter comma.
I stand by my assertion that lacking a priori knowledge that a character
can not occur in the data the task becomes more difficult; not impossible.
And I can't resist mentioning (again):
http://xkcd.com/327/
This is the second time in this week that you had a comment about DFSORT.
I am a developer of DFSORT and there are many cool things that DFSORT can
do and I am more than willing to show them. However I also believe using
the right utility/program for the right job. I do not suggest to use
DFSORT for every solution I posted here.
My apologies if I offended. I recognize that DFSORT isn't the only tool in your
kit, and I'm aware that you suggest other techniques when appropriate.
Still, since DFSORT appears to be the Swiss Army Knife of z/OS, I can't
resist wondering, idly, whether DFSORT is Turing-complete.
-- gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN