Hi Matthew,
I am using the new `data.table` `fread()` function to read my csv
files, which has the format as follows when using the read.csv
function
Date.and.Time Open High Low Close Volume
1 2007/01/01 22:51:00 5683 5683 5673 5673 64
2 2007/01/01 22:52:00 5675 5676 5674 5674 17
3 2007/01/01 22:53:00 5674 5674 5673 5674 42
The value of the first column is all of: `2007/01/01 22:53:00`,
the next 5 columns are separated with commas.
but when reading the same file using fread i get the following
output
V1 V2 V3
1 2007 1 01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
2 2007 1 01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
3 2007 1 01 22:53:00,5674.00,5674.00,5673.00,5674.00,42
This is because the autodetect is using the "/" as a separator...
I tried overriding this using the `sep=","` argument but this does
not seem to be used in the function anywhere.
Furthremore when using verbose I get the following output, which
suggests that I was right in thinking that "/" is used as a
separator rather than ",".
Is there any way to fix this, so that it correctly reads all 6
columns separately?
Thanks
HLM
On 21 Dec 2012, at 18:28, Matthew Dowle <[email protected]>
wrote:
Hi datatablers,
Feedback and bug reports much appreciated :
=====
New function fread(), a fast and friendly file reader.
* header, skip, nrows, sep and colClasses are all auto detected.
* integers>2^31 are detected and read natively as
bit64::integer64.
* accepts filenames, URLs and "A,B\n1,2\n3,4" directly
* new implementation entirely in C
* with a 50MB .csv, 1 million rows x 6 columns :
read.csv("test.csv") # 30-60
sec
read.table("test.csv",<all known tricks, known nrows>) # 10
sec
fread("test.csv") # 3
sec
* airline data: 658MB csv (7 million rows x 29 columns)
read.table("2008.csv",<all known tricks, known nrows>) # 360
sec
fread("2008.csv") # 50
sec
See ?fread. Many thanks to Chris Neff and Garrett See for ideas,
discussions and beta testing.
=====
1.8.7 is passing checks on Unix and Windows (but not Mac yet) :
install.packages("data.table",
repos="http://R-Forge.R-project.org")
require(data.table)
?fread
fread("your biggest baddest file")
Oddly, R-Forge appears to be compiling Win64 with -O2
optimization rather
than -O3 (but -O3 on Win32 ok), so speedups might not be as great
on Win64
until that can be resolved on R-Forge, unless you compile
yourself. -O3
has some optimizations that fread may benefit from. But
interested to hear.
Seasons greatings!
Matthew
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help