Interesting. How much do you find read.csv is sped up by reading
gzip'd files? 

On 02.04.2013 20:36, Nathaniel Graham wrote: 

> Thanks,
but I suspect that it would take longer to setup and then remove 
> a
ramdisk than it would to use read.csv and data.table. My files are 
>
moderately large (between 200 MB and 3 GB when compressed), but not 
>
enormous; I gzip not so much to save space on disk but to speed up
reads. 
> 
> -------
> Nathaniel Graham
> [email protected] [3]
>
[email protected] [4] 
> 
> On Tue, Apr 2, 2013 at 3:12 PM, Matthew
Dowle <[email protected] [5]> wrote:
> 
>> Hi, 
>> 
>> fread memory
maps the entire uncompressed file and this is baked into the way it
works (e.g. skipping to the beginning, middle and last 5 rows to detect
column types before starting to read the rows in) and where the
convenience and speed comes from. 
>> 
>> You could uncompress the .gz
to a ramdisk first, and then fread the uncompressed file from that
ramdisk, is probably the fastest way. Which should still be pretty quick
and I guess unlikely much slower than anything we could build into fread
(provided you use a ramdisk). 
>> 
>> Matthew 
>> 
>> On 02.04.2013
19:30, Nathaniel Graham wrote: 
>> 
>>> I have a moderately large csv
file that's gzipped, but not in a tar 
>>> archive, so it's
"filename.csv.gz" that I want to read into a data.table. 
>>> I'd like
to use fread(), but I can't seem to make it work. I'm currently 
>>>
using the following: 
>>>
data.table(read.csv(gzfile("filename.csv.gz","r"))) 
>>> Various
combinations of gzfile, gzcon, file, readLines, and 
>>> textConnection
all produce an error (invalid input). Is there a better 
>>> way to read
in large, compressed files? 
>>> 
>>> -------
>>> Nathaniel Graham
>>>
[email protected] [1]
>>> [email protected] [2]

 

Links:
------
[1]
mailto:[email protected]
[2] mailto:[email protected]
[3]
mailto:[email protected]
[4] mailto:[email protected]
[5]
mailto:[email protected]
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to