No problem for the confirm…Thanks again for fixing it.

As for the file itself having "Date and Time", you are right….i just assumed 
that this function was designed to replace/speed up the read.csv function, i.e. 
work in exactly the same way but faster. Thanks for letting me know about the 
make.names call though.



On 28 Dec 2012, at 22:06, Matthew Dowle <[email protected]> wrote:

> 
> Great. Thanks for confirm.
> 
> The file itself has "Date and Time" as the column name doesn't it i.e. with 
> spaces not dots? fread retains exactly what's in the file, whereas read.csv 
> runs the column names through base::make.names() which converts the spaces to 
> dots to make the column names syntactically valid, iiuc. data.table's general 
> policy is to allow spaces and other unusual characters in columns names and 
> retain them throughout (forgiving the odd bug now fixed caused by some 
> make.names calls which should have been make.unique).
> 
> To do the same as read.csv :
> 
>    DT = fread(...)
>    setnames(DT,make.names(names(DT)))
> 
> Not sure I understood correctly and I didn't test.
> 
> 
> On 28.12.2012 21:36, Hideyoshi Maeda wrote:
>> The sep argument now works thank you!
>> 
>> But just out of curiosity…not a major problem of sorts but by using
>> fread(file.path,sep=",") on my csv file, the column names includes "."
>> as shown in my original email… but the output result automatically
>> removes the "." in the column name…is there a way to stop it from
>> doing that?, i.e. the first column becomes "Data and Time"  when using
>> fread, rather than the original "Date.and.Time" when using read.csv
>> 
>> 
>> On 26 Dec 2012, at 22:21, Matthew Dowle <[email protected]> wrote:
>> 
>>> 
>>> sep is now passed through and have added your example as a test.
>>> Hope ok now.
>>> 
>>> Thanks,
>>> Matthew
>>> 
>>> On 24.12.2012 14:18, Hideyoshi Maeda wrote:
>>>> using autostart=1 gives the following error
>>>> 
>>>> Error in fread(file.path, autostart = 1) :
>>>> ' ends field 2 on line 1 when detecting types: Date and
>>>> Time,Open,High,Low,Close,Volume
>>>> 2007/01/01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
>>>> 
>>>> 
>>>> On 24 Dec 2012, at 13:48, Matthew Dowle <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> Yes autostart is the line it detects separators, then it searches upwards 
>>>>> to find the first row with the same number of columns. If that row is all 
>>>>> character then it deems that as the column name row. So if you start 
>>>>> autostart on 1, it's already at the top and it might catch the right 
>>>>> separator by avoiding the data rows for separator detection.
>>>>> 
>>>>> On 24.12.2012 11:52, Hideyoshi Maeda wrote:
>>>>>> Thanks for the quick response.
>>>>>> 
>>>>>> I wasn't sure if I understood you correctly, but isn't the problem
>>>>>> the way that autostart finds separators?
>>>>>> 
>>>>>> and in my example, it had headers, so I think it would need to start
>>>>>> from row 2 wouldn't it, i.e. the first row that has non-header values?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> On 24 Dec 2012, at 11:44, Matthew Dowle <[email protected]> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Ah yes, haven't hooked up the sep override yet, apologies, will fix.
>>>>>>> Maybe setting autostart to the row number of the header row (probably 1)
>>>>>>> might work.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Matthew
>>>>>>> 
>>>>>>> 
>>>>>>> On 24.12.2012 11:08, Hideyoshi Maeda wrote:
>>>>>>>> oups…forgot to add the output from the verbose part…here it is...
>>>>>>>> 
>>>>>>>> Detected eol as \r\n (CRLF) in that order, the Windows standard.
>>>>>>>> Starting format detection on line 30 (the last non blank line in the
>>>>>>>> first 30)
>>>>>>>> Detected sep as '/' and 3 columns
>>>>>>>> Type codes: 003
>>>>>>>> Found first row with 3 fields occuring on line 1 (either column names
>>>>>>>> or first row of data)
>>>>>>>> The first data row has some non character fields. Treating as a data
>>>>>>>> row and using default column names.
>>>>>>>> Count of eol after pos: 1143699
>>>>>>>> Subtracted 1 for last eol and any trailing empty lines, leaving
>>>>>>>> 1143698 data rows
>>>>>>>> 0.153s ( 21%) Memory map (quicker if you rerun)
>>>>>>>> 0.000s (  0%) Format detection
>>>>>>>> 0.095s ( 13%) Count rows (wc -l)
>>>>>>>> 0.001s (  0%) Allocation of 1143698x3 result (xMB) in RAM
>>>>>>>> 0.480s ( 66%) Reading data
>>>>>>>> 0.000s (  0%) Bumping column type midread and coercing data already 
>>>>>>>> read
>>>>>>>> 0.002s (  0%) Changing na.strings to NA
>>>>>>>> 0.731s        Total
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 24 Dec 2012, at 11:04, Hideyoshi Maeda <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Matthew,
>>>>>>>>> 
>>>>>>>>> I am using the new `data.table` `fread()` function to read my csv 
>>>>>>>>> files, which has the format as follows when using the read.csv 
>>>>>>>>> function
>>>>>>>>> 
>>>>>>>>>        Date.and.Time Open High  Low Close Volume
>>>>>>>>> 1 2007/01/01 22:51:00 5683 5683 5673  5673     64
>>>>>>>>> 2 2007/01/01 22:52:00 5675 5676 5674  5674     17
>>>>>>>>> 3 2007/01/01 22:53:00 5674 5674 5673  5674     42
>>>>>>>>> 
>>>>>>>>> The value of the first column is all of: `2007/01/01 22:53:00`, the 
>>>>>>>>> next 5 columns are separated with commas.
>>>>>>>>> 
>>>>>>>>> but when reading the same file using fread i get the following output
>>>>>>>>> 
>>>>>>>>>    V1 V2                                             V3
>>>>>>>>> 1 2007  1 01 22:51:00,5683.00,5683.00,5673.00,5673.00,64
>>>>>>>>> 2 2007  1 01 22:52:00,5675.00,5676.00,5674.00,5674.00,17
>>>>>>>>> 3 2007  1 01 22:53:00,5674.00,5674.00,5673.00,5674.00,42
>>>>>>>>> 
>>>>>>>>> This is because the autodetect is using the "/" as a separator...
>>>>>>>>> 
>>>>>>>>> I tried overriding this using the `sep=","` argument but this does 
>>>>>>>>> not seem to be used in the function anywhere.
>>>>>>>>> 
>>>>>>>>> Furthremore when using verbose I get the following output, which 
>>>>>>>>> suggests that I was right in thinking that "/" is used as a separator 
>>>>>>>>> rather than ",".
>>>>>>>>> 
>>>>>>>>> Is there any way to fix this, so that it correctly reads all 6 
>>>>>>>>> columns separately?
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> HLM
>>>>>>>>> 
>>>>>>>>> On 21 Dec 2012, at 18:28, Matthew Dowle <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi datatablers,
>>>>>>>>>> 
>>>>>>>>>> Feedback and bug reports much appreciated :
>>>>>>>>>> 
>>>>>>>>>> =====
>>>>>>>>>> New function fread(), a fast and friendly file reader.
>>>>>>>>>> * header, skip, nrows, sep and colClasses are all auto detected.
>>>>>>>>>> * integers>2^31 are detected and read natively as bit64::integer64.
>>>>>>>>>> * accepts filenames, URLs and "A,B\n1,2\n3,4" directly
>>>>>>>>>> * new implementation entirely in C
>>>>>>>>>> * with a 50MB .csv, 1 million rows x 6 columns :
>>>>>>>>>> read.csv("test.csv")                                   # 30-60 sec
>>>>>>>>>> read.table("test.csv",<all known tricks, known nrows>) #    10 sec
>>>>>>>>>> fread("test.csv")                                      #     3 sec
>>>>>>>>>> * airline data: 658MB csv (7 million rows x 29 columns)
>>>>>>>>>> read.table("2008.csv",<all known tricks, known nrows>) #   360 sec
>>>>>>>>>> fread("2008.csv")                                      #    50 sec
>>>>>>>>>> See ?fread. Many thanks to Chris Neff and Garrett See for ideas,
>>>>>>>>>> discussions and beta testing.
>>>>>>>>>> =====
>>>>>>>>>> 
>>>>>>>>>> 1.8.7 is passing checks on Unix and Windows (but not Mac yet) :
>>>>>>>>>> 
>>>>>>>>>> install.packages("data.table", repos="http://R-Forge.R-project.org";)
>>>>>>>>>> require(data.table)
>>>>>>>>>> ?fread
>>>>>>>>>> fread("your biggest baddest file")
>>>>>>>>>> 
>>>>>>>>>> Oddly, R-Forge appears to be compiling Win64 with -O2 optimization 
>>>>>>>>>> rather
>>>>>>>>>> than -O3 (but -O3 on Win32 ok), so speedups might not be as great on 
>>>>>>>>>> Win64
>>>>>>>>>> until that can be resolved on R-Forge, unless you compile yourself. 
>>>>>>>>>> -O3
>>>>>>>>>> has some optimizations that fread may benefit from. But interested 
>>>>>>>>>> to hear.
>>>>>>>>>> 
>>>>>>>>>> Seasons greatings!
>>>>>>>>>> 
>>>>>>>>>> Matthew
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> datatable-help mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>>>>> 
>>>>>>> 
>>>>> 
> 

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to