No problem for the confirm…Thanks again for fixing it. As for the file itself having "Date and Time", you are right….i just assumed that this function was designed to replace/speed up the read.csv function, i.e. work in exactly the same way but faster. Thanks for letting me know about the make.names call though.
On 28 Dec 2012, at 22:06, Matthew Dowle <[email protected]> wrote: > > Great. Thanks for confirm. > > The file itself has "Date and Time" as the column name doesn't it i.e. with > spaces not dots? fread retains exactly what's in the file, whereas read.csv > runs the column names through base::make.names() which converts the spaces to > dots to make the column names syntactically valid, iiuc. data.table's general > policy is to allow spaces and other unusual characters in columns names and > retain them throughout (forgiving the odd bug now fixed caused by some > make.names calls which should have been make.unique). > > To do the same as read.csv : > > DT = fread(...) > setnames(DT,make.names(names(DT))) > > Not sure I understood correctly and I didn't test. > > > On 28.12.2012 21:36, Hideyoshi Maeda wrote: >> The sep argument now works thank you! >> >> But just out of curiosity…not a major problem of sorts but by using >> fread(file.path,sep=",") on my csv file, the column names includes "." >> as shown in my original email… but the output result automatically >> removes the "." in the column name…is there a way to stop it from >> doing that?, i.e. the first column becomes "Data and Time" when using >> fread, rather than the original "Date.and.Time" when using read.csv >> >> >> On 26 Dec 2012, at 22:21, Matthew Dowle <[email protected]> wrote: >> >>> >>> sep is now passed through and have added your example as a test. >>> Hope ok now. >>> >>> Thanks, >>> Matthew >>> >>> On 24.12.2012 14:18, Hideyoshi Maeda wrote: >>>> using autostart=1 gives the following error >>>> >>>> Error in fread(file.path, autostart = 1) : >>>> ' ends field 2 on line 1 when detecting types: Date and >>>> Time,Open,High,Low,Close,Volume >>>> 2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64 >>>> >>>> >>>> On 24 Dec 2012, at 13:48, Matthew Dowle <[email protected]> wrote: >>>> >>>>> >>>>> Yes autostart is the line it detects separators, then it searches upwards >>>>> to find the first row with the same number of columns. If that row is all >>>>> character then it deems that as the column name row. So if you start >>>>> autostart on 1, it's already at the top and it might catch the right >>>>> separator by avoiding the data rows for separator detection. >>>>> >>>>> On 24.12.2012 11:52, Hideyoshi Maeda wrote: >>>>>> Thanks for the quick response. >>>>>> >>>>>> I wasn't sure if I understood you correctly, but isn't the problem >>>>>> the way that autostart finds separators? >>>>>> >>>>>> and in my example, it had headers, so I think it would need to start >>>>>> from row 2 wouldn't it, i.e. the first row that has non-header values? >>>>>> >>>>>> Thanks >>>>>> >>>>>> On 24 Dec 2012, at 11:44, Matthew Dowle <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Ah yes, haven't hooked up the sep override yet, apologies, will fix. >>>>>>> Maybe setting autostart to the row number of the header row (probably 1) >>>>>>> might work. >>>>>>> >>>>>>> Thanks, >>>>>>> Matthew >>>>>>> >>>>>>> >>>>>>> On 24.12.2012 11:08, Hideyoshi Maeda wrote: >>>>>>>> oups…forgot to add the output from the verbose part…here it is... >>>>>>>> >>>>>>>> Detected eol as \r\n (CRLF) in that order, the Windows standard. >>>>>>>> Starting format detection on line 30 (the last non blank line in the >>>>>>>> first 30) >>>>>>>> Detected sep as '/' and 3 columns >>>>>>>> Type codes: 003 >>>>>>>> Found first row with 3 fields occuring on line 1 (either column names >>>>>>>> or first row of data) >>>>>>>> The first data row has some non character fields. Treating as a data >>>>>>>> row and using default column names. >>>>>>>> Count of eol after pos: 1143699 >>>>>>>> Subtracted 1 for last eol and any trailing empty lines, leaving >>>>>>>> 1143698 data rows >>>>>>>> 0.153s ( 21%) Memory map (quicker if you rerun) >>>>>>>> 0.000s ( 0%) Format detection >>>>>>>> 0.095s ( 13%) Count rows (wc -l) >>>>>>>> 0.001s ( 0%) Allocation of 1143698x3 result (xMB) in RAM >>>>>>>> 0.480s ( 66%) Reading data >>>>>>>> 0.000s ( 0%) Bumping column type midread and coercing data already >>>>>>>> read >>>>>>>> 0.002s ( 0%) Changing na.strings to NA >>>>>>>> 0.731s Total >>>>>>>> >>>>>>>> >>>>>>>> On 24 Dec 2012, at 11:04, Hideyoshi Maeda <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Matthew, >>>>>>>>> >>>>>>>>> I am using the new `data.table` `fread()` function to read my csv >>>>>>>>> files, which has the format as follows when using the read.csv >>>>>>>>> function >>>>>>>>> >>>>>>>>> Date.and.Time Open High Low Close Volume >>>>>>>>> 1 2007/01/01 22:51:00 5683 5683 5673 5673 64 >>>>>>>>> 2 2007/01/01 22:52:00 5675 5676 5674 5674 17 >>>>>>>>> 3 2007/01/01 22:53:00 5674 5674 5673 5674 42 >>>>>>>>> >>>>>>>>> The value of the first column is all of: `2007/01/01 22:53:00`, the >>>>>>>>> next 5 columns are separated with commas. >>>>>>>>> >>>>>>>>> but when reading the same file using fread i get the following output >>>>>>>>> >>>>>>>>> V1 V2 V3 >>>>>>>>> 1 2007 1 01 22:51:00,5683.00,5683.00,5673.00,5673.00,64 >>>>>>>>> 2 2007 1 01 22:52:00,5675.00,5676.00,5674.00,5674.00,17 >>>>>>>>> 3 2007 1 01 22:53:00,5674.00,5674.00,5673.00,5674.00,42 >>>>>>>>> >>>>>>>>> This is because the autodetect is using the "/" as a separator... >>>>>>>>> >>>>>>>>> I tried overriding this using the `sep=","` argument but this does >>>>>>>>> not seem to be used in the function anywhere. >>>>>>>>> >>>>>>>>> Furthremore when using verbose I get the following output, which >>>>>>>>> suggests that I was right in thinking that "/" is used as a separator >>>>>>>>> rather than ",". >>>>>>>>> >>>>>>>>> Is there any way to fix this, so that it correctly reads all 6 >>>>>>>>> columns separately? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> HLM >>>>>>>>> >>>>>>>>> On 21 Dec 2012, at 18:28, Matthew Dowle <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi datatablers, >>>>>>>>>> >>>>>>>>>> Feedback and bug reports much appreciated : >>>>>>>>>> >>>>>>>>>> ===== >>>>>>>>>> New function fread(), a fast and friendly file reader. >>>>>>>>>> * header, skip, nrows, sep and colClasses are all auto detected. >>>>>>>>>> * integers>2^31 are detected and read natively as bit64::integer64. >>>>>>>>>> * accepts filenames, URLs and "A,B\n1,2\n3,4" directly >>>>>>>>>> * new implementation entirely in C >>>>>>>>>> * with a 50MB .csv, 1 million rows x 6 columns : >>>>>>>>>> read.csv("test.csv") # 30-60 sec >>>>>>>>>> read.table("test.csv",<all known tricks, known nrows>) # 10 sec >>>>>>>>>> fread("test.csv") # 3 sec >>>>>>>>>> * airline data: 658MB csv (7 million rows x 29 columns) >>>>>>>>>> read.table("2008.csv",<all known tricks, known nrows>) # 360 sec >>>>>>>>>> fread("2008.csv") # 50 sec >>>>>>>>>> See ?fread. Many thanks to Chris Neff and Garrett See for ideas, >>>>>>>>>> discussions and beta testing. >>>>>>>>>> ===== >>>>>>>>>> >>>>>>>>>> 1.8.7 is passing checks on Unix and Windows (but not Mac yet) : >>>>>>>>>> >>>>>>>>>> install.packages("data.table", repos="http://R-Forge.R-project.org") >>>>>>>>>> require(data.table) >>>>>>>>>> ?fread >>>>>>>>>> fread("your biggest baddest file") >>>>>>>>>> >>>>>>>>>> Oddly, R-Forge appears to be compiling Win64 with -O2 optimization >>>>>>>>>> rather >>>>>>>>>> than -O3 (but -O3 on Win32 ok), so speedups might not be as great on >>>>>>>>>> Win64 >>>>>>>>>> until that can be resolved on R-Forge, unless you compile yourself. >>>>>>>>>> -O3 >>>>>>>>>> has some optimizations that fread may benefit from. But interested >>>>>>>>>> to hear. >>>>>>>>>> >>>>>>>>>> Seasons greatings! >>>>>>>>>> >>>>>>>>>> Matthew >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> datatable-help mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>>>>>>>> >>>>>>> >>>>> > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
