Le 06/12/2011 23:13, Wes McKinney a écrit :
> I think R has two functions read.csv and read.csv2, where read.csv2 is
> capable of dealing with things like European decimal format.
>
I may be wrong, but from R's help I understand that read.csv, read.csv2,
read.delim, ...
are just calls to read.table with different default values (for
separtor, decimal sign, ....)
This function read.table is indeed pretty flexible (see signature below)
Having a dedicated fast function for properly formatted CSV table may be
a good idea.
But how to define "properly formatted" ... I've seen many tiny
variations so I'm not sure !
Now for my personal use, I was not so frustrated by loading performance
but rather by NA support, so I wrote my own loadCsv function to get a
masked array. It was nor beautiful, neither very efficient, but it does
the job !
Best,
Pierre
read.table &co signatures :
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)
read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".",
fill = TRUE, comment.char="", ...)
read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",",
fill = TRUE, comment.char="", ...)
---------------------------------------------------------
Copy paste from my own dirty "csv toolbox"
NA = -9999.
def _NA_conv(s):
'''convert a string number representation into a float,
with a special behaviour for "NA" values :
if s=="" or "NA", it returns the key value NA (set to -9999.)
'''
if s=='' or s=='NA':
return NA
else:
return float(s)
def loadCsv(filename, delimiter=',', usecols=None, skiprows=1):
'''wrapper around numpy.loadtxt to load
a properly R formatted CSV file with NA values
of which the first row should be a header row
Returns
-------
(headers, data, dataNAs)
'''
# 1) Read header
headers = []
with open(filename) as f:
line = f.readline().strip()
headers = line.split(delimiter)
if usecols:
headers = [headers[i] for i in usecols]
# 2) Read
converters = None
if usecols is not None:
converters = dict(zip(usecols, [_NA_conv]*len(usecols)))
data = np.loadtxt(filename,
delimiter=delimiter, usecols=usecols,
skiprows=skiprows,
converters = converters
)
dataNAs = (data == NA)
# Set NAs to zero
data[dataNAs] = 0.
# Transforms array in "masked array"
data = np.ma.masked_array(data, dataNAs)
return (headers, data, dataNAs)
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion