So, here is a `head' of my dataset. Note the `,,' in the 2nd last
column.
02-FEB-2009,09:55:04:962,26022009,2500,PE,36,500,44,200,11850,1100,,2865.60
02-FEB-2009,09:55:04:987,26022009,2800,PE,108.75,200,111,50,11700,1450,,2865.60
02-FEB-2009,09:55:04:939,26022009,3100,CE,31.1,3000,36.55,200,3500,5250,,2865.60
02-FEB-2009,09:55:04:989,26022009,2600,PE,52.05,500,57,400,16050,1150,,2865.60
02-FEB-2009,09:55:04:981,26022009,3000,CE,56.25,2000,67,150,21500,13750,,2865.60
02-FEB-2009,09:55:04:991,26022009,2900,CE,81,1000,100,100,18100,4550,1000,2865.60
02-FEB-2009,09:55:04:953,26022009,2800,CE,150,50,159.7,5000,13400,15500,,2865.60
02-FEB-2009,09:55:04:987,26022009,2700,PE,72.15,3000,79,50,19200,5100,,2865.60
02-FEB-2009,09:55:04:615,26022009,2450,CE,256.9,500,678,500,500,500,,2865.60
02-FEB-2009,09:55:04:894,26022009,3300,CE,6,7000,10.8,2000,7000,2550,,2865.60
The documentation says that ",," should be read as "". But instead
the
function throws an error (one I can not understand). See here:
R> library(data.table)
data.table 1.8.7 For help type: help("data.table")
R> tt <- fread("sample.csv", verbose=TRUE)
Detected eol as \n only (no \r afterwards), the UNIX and Mac
standard.
Starting format detection on line 30 (the last non blank line in the
first 30)
Detected sep as ',' and 13 columns
Type codes: 3300320200002
Found first row with 13 fields occuring on line 1 (either column
names
or first row of data)
Error in fread("sample.csv", verbose = TRUE) : Unexpected character
(
02-F) ending field 12 of line 1
Using na.strings="" does not work either. But I guess that should
not
have made a difference anyway?
Then I opened the file in GVim and converted all `,,' to `,NA,' and
re-read the file. This time it works.
R> tt <- fread("sample-with-NA.csv", verbose=TRUE)
Detected eol as \n only (no \r afterwards), the UNIX and Mac
standard.
Starting format detection on line 30 (the last non blank line in the
first 30)
Detected sep as ',' and 13 columns
Type codes: 3300320200002
Found first row with 13 fields occuring on line 1 (either column
names
or first row of data)
The first data row has some non character fields. Treating as a data
row and using default column names.
Count of eol after pos: 101
Subtracted 1 for last eol and any trailing empty lines, leaving 100
data rows
0.000s ( 6%) Memory map (quicker if you rerun)
0.000s ( 40%) Format detection
0.000s ( 7%) Count rows (wc -l)
0.000s ( 2%) Allocation of 100x13 result (xMB) in RAM
0.000s ( 41%) Reading data
0.000s ( 0%) Bumping column type midread and coercing data
already read
0.000s ( 3%) Changing na.strings to NA
0.001s Total
I've attached a 100 row sample.csv and a sample-with-NA.csv here for
you to replicate the issue.
Maybe, it is just that I am missing something. Can you explain?
Thanks a lot!
--
ASB.