Btw we like backticks in data.table :
DT[,`Date and Time`]
setkey(DT,`Date and Time`) # [*]
although you'd probably setnames(DT,"Date and Time","datetime") for a
core column like that.
[*] which I've just noticed doesn't work, will file new bug report.
On 28.12.2012 22:06, Matthew Dowle wrote:
Great. Thanks for confirm.
The file itself has "Date and Time" as the column name doesn't it
i.e. with spaces not dots? fread retains exactly what's in the file,
whereas read.csv runs the column names through base::make.names()
which converts the spaces to dots to make the column names
syntactically valid, iiuc. data.table's general policy is to allow
spaces and other unusual characters in columns names and retain them
throughout (forgiving the odd bug now fixed caused by some make.names
calls which should have been make.unique).
To do the same as read.csv :
DT = fread(...)
setnames(DT,make.names(names(DT)))
Not sure I understood correctly and I didn't test.
On 28.12.2012 21:36, Hideyoshi Maeda wrote:
The sep argument now works thank you!
But just out of curiosity…not a major problem of sorts but by using
fread(file.path,sep=",") on my csv file, the column names includes
"."
as shown in my original email… but the output result automatically
removes the "." in the column name…is there a way to stop it from
doing that?, i.e. the first column becomes "Data and Time" when
using
fread, rather than the original "Date.and.Time" when using read.csv
On 26 Dec 2012, at 22:21, Matthew Dowle <[email protected]>
wrote:
sep is now passed through and have added your example as a test.
Hope ok now.
Thanks,
Matthew
On 24.12.2012 14:18, Hideyoshi Maeda wrote:
using autostart=1 gives the following error
Error in fread(file.path, autostart = 1) :
' ends field 2 on line 1 when detecting types: Date and
Time,Open,High,Low,Close,Volume
2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
On 24 Dec 2012, at 13:48, Matthew Dowle <[email protected]>
wrote:
Yes autostart is the line it detects separators, then it searches
upwards to find the first row with the same number of columns. If
that row is all character then it deems that as the column name
row. So if you start autostart on 1, it's already at the top and it
might catch the right separator by avoiding the data rows for
separator detection.
On 24.12.2012 11:52, Hideyoshi Maeda wrote:
Thanks for the quick response.
I wasn't sure if I understood you correctly, but isn't the
problem
the way that autostart finds separators?
and in my example, it had headers, so I think it would need to
start
from row 2 wouldn't it, i.e. the first row that has non-header
values?
Thanks
On 24 Dec 2012, at 11:44, Matthew Dowle <[email protected]>
wrote:
Hi,
Ah yes, haven't hooked up the sep override yet, apologies, will
fix.
Maybe setting autostart to the row number of the header row
(probably 1)
might work.
Thanks,
Matthew
On 24.12.2012 11:08, Hideyoshi Maeda wrote:
oups…forgot to add the output from the verbose part…here it
is...
Detected eol as \r\n (CRLF) in that order, the Windows
standard.
Starting format detection on line 30 (the last non blank line
in the
first 30)
Detected sep as '/' and 3 columns
Type codes: 003
Found first row with 3 fields occuring on line 1 (either
column names
or first row of data)
The first data row has some non character fields. Treating as
a data
row and using default column names.
Count of eol after pos: 1143699
Subtracted 1 for last eol and any trailing empty lines,
leaving
1143698 data rows
0.153s ( 21%) Memory map (quicker if you rerun)
0.000s ( 0%) Format detection
0.095s ( 13%) Count rows (wc -l)
0.001s ( 0%) Allocation of 1143698x3 result (xMB) in RAM
0.480s ( 66%) Reading data
0.000s ( 0%) Bumping column type midread and coercing data
already read
0.002s ( 0%) Changing na.strings to NA
0.731s Total
On 24 Dec 2012, at 11:04, Hideyoshi Maeda
<[email protected]> wrote:
Hi Matthew,
I am using the new `data.table` `fread()` function to read my
csv files, which has the format as follows when using the
read.csv function
Date.and.Time Open High Low Close Volume
1 2007/01/01 22:51:00 5683 5683 5673 5673 64
2 2007/01/01 22:52:00 5675 5676 5674 5674 17
3 2007/01/01 22:53:00 5674 5674 5673 5674 42
The value of the first column is all of: `2007/01/01
22:53:00`, the next 5 columns are separated with commas.
but when reading the same file using fread i get the
following output
V1 V2 V3
1 2007 1 01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
2 2007 1 01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
3 2007 1 01 22:53:00,5674.00,5674.00,5673.00,5674.00,42
This is because the autodetect is using the "/" as a
separator...
I tried overriding this using the `sep=","` argument but this
does not seem to be used in the function anywhere.
Furthremore when using verbose I get the following output,
which suggests that I was right in thinking that "/" is used as
a separator rather than ",".
Is there any way to fix this, so that it correctly reads all
6 columns separately?
Thanks
HLM
On 21 Dec 2012, at 18:28, Matthew Dowle
<[email protected]> wrote:
Hi datatablers,
Feedback and bug reports much appreciated :
=====
New function fread(), a fast and friendly file reader.
* header, skip, nrows, sep and colClasses are all auto
detected.
* integers>2^31 are detected and read natively as
bit64::integer64.
* accepts filenames, URLs and "A,B\n1,2\n3,4" directly
* new implementation entirely in C
* with a 50MB .csv, 1 million rows x 6 columns :
read.csv("test.csv") #
30-60 sec
read.table("test.csv",<all known tricks, known nrows>) #
10 sec
fread("test.csv") #
3 sec
* airline data: 658MB csv (7 million rows x 29 columns)
read.table("2008.csv",<all known tricks, known nrows>) #
360 sec
fread("2008.csv") #
50 sec
See ?fread. Many thanks to Chris Neff and Garrett See for
ideas,
discussions and beta testing.
=====
1.8.7 is passing checks on Unix and Windows (but not Mac
yet) :
install.packages("data.table",
repos="http://R-Forge.R-project.org")
require(data.table)
?fread
fread("your biggest baddest file")
Oddly, R-Forge appears to be compiling Win64 with -O2
optimization rather
than -O3 (but -O3 on Win32 ok), so speedups might not be as
great on Win64
until that can be resolved on R-Forge, unless you compile
yourself. -O3
has some optimizations that fread may benefit from. But
interested to hear.
Seasons greatings!
Matthew
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help