Interesting. How much do you find read.csv is sped up by reading gzip'd files?
On 02.04.2013 20:36, Nathaniel Graham wrote: > Thanks, but I suspect that it would take longer to setup and then remove > a ramdisk than it would to use read.csv and data.table. My files are > moderately large (between 200 MB and 3 GB when compressed), but not > enormous; I gzip not so much to save space on disk but to speed up reads. > > ------- > Nathaniel Graham > [email protected] [3] > [email protected] [4] > > On Tue, Apr 2, 2013 at 3:12 PM, Matthew Dowle <[email protected] [5]> wrote: > >> Hi, >> >> fread memory maps the entire uncompressed file and this is baked into the way it works (e.g. skipping to the beginning, middle and last 5 rows to detect column types before starting to read the rows in) and where the convenience and speed comes from. >> >> You could uncompress the .gz to a ramdisk first, and then fread the uncompressed file from that ramdisk, is probably the fastest way. Which should still be pretty quick and I guess unlikely much slower than anything we could build into fread (provided you use a ramdisk). >> >> Matthew >> >> On 02.04.2013 19:30, Nathaniel Graham wrote: >> >>> I have a moderately large csv file that's gzipped, but not in a tar >>> archive, so it's "filename.csv.gz" that I want to read into a data.table. >>> I'd like to use fread(), but I can't seem to make it work. I'm currently >>> using the following: >>> data.table(read.csv(gzfile("filename.csv.gz","r"))) >>> Various combinations of gzfile, gzcon, file, readLines, and >>> textConnection all produce an error (invalid input). Is there a better >>> way to read in large, compressed files? >>> >>> ------- >>> Nathaniel Graham >>> [email protected] [1] >>> [email protected] [2] Links: ------ [1] mailto:[email protected] [2] mailto:[email protected] [3] mailto:[email protected] [4] mailto:[email protected] [5] mailto:[email protected]
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
