That does not seem like a large data set. How are you reading it?
How many columns does it have? What is a lot of time by your
definition? You have provided minimal data for obtaining help. I
common read in files with 300K rows in under 30 seconds. Maybe you
need to consider a relational
1. You can pipe your data through gawk (or other scripting language)
process as in:
http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2129.html
2. read.csv.sql in the sqldf package on CRAN will set up a database
for you, read the file into the database automatically defining the
layout of the
Hello Jim and Gabor,
Thanks for your inputs. The lines:
a-as.matrix(read.table(pipe(awk -f cut.awk Data.file)))
cut.awk{for(i = 1; i = NF; i=i+10) print $i,}
solved my problem. I know that 40k lines is not a large data set. I have
about 150 files each of which has 40k rows and in each file I
Hello All,
I have a 40k rows long data set that is taking a lot of time to be read-in.
Is there a way to skip reading even/odd numbered rows or read-in only rows
that are multiples of, say, 10? This way I get the general trend of the data
w/o actually reading the entire thing. The option 'skip'
4 matches
Mail list logo