Re: [R] Conditional read-in of data

2009-11-04 Thread jim holtman
That does not seem like a large data set.  How are you reading it?
How many columns does it have?  What is a lot of time by your
definition?  You have provided minimal data for obtaining help.  I
common read in files with 300K rows in under 30 seconds.  Maybe you
need to consider a relational database for storing your data.

On Wed, Nov 4, 2009 at 12:07 AM, mnstn pavan.n...@gmail.com wrote:

 Hello All,
 I have a 40k rows long data set that is taking a lot of time to be read-in.
 Is there a way to skip reading even/odd numbered rows or read-in only rows
 that are multiples of, say, 10? This way I get the general trend of the data
 w/o actually reading the entire thing. The option 'skip' in read.table
 simply skips the first n rows and reads the rest. I do understand that once
 the full data set (40k rows) is read-in, I can manipulate the data. But the
 bottle-neck here is the first read/scan of data.

 I searched in the forum using key words (conditional skip/skip reading
 rows/skip data/conditional data read) etc. but couldn't find relevant
 conversations. I apologize if this has already been discussed since it does
 seem hard to imagine that nobody has come across this problem yet.

 Any suggestions/comments are welcome.
 Thanks,
 mnstn
 --
 View this message in context: 
 http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional read-in of data

2009-11-04 Thread Gabor Grothendieck
1. You can pipe your data through gawk (or other scripting language)
process as in:
http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2129.html

2. read.csv.sql in the sqldf package on CRAN will set up a database
for you, read the file into the database automatically defining the
layout of the table, extract a portion into R based on an sql
statement that you provide and then destroy the database all in one
statement.  It uses the sqlite database which is included in the
RSQLite R package that it depends on so there is nothing to separately
install.
See ?read.csv.sql in the package and also see example 13 on the home page:
http://sqldf.googlecode.com


On Wed, Nov 4, 2009 at 12:07 AM, mnstn pavan.n...@gmail.com wrote:

 Hello All,
 I have a 40k rows long data set that is taking a lot of time to be read-in.
 Is there a way to skip reading even/odd numbered rows or read-in only rows
 that are multiples of, say, 10? This way I get the general trend of the data
 w/o actually reading the entire thing. The option 'skip' in read.table
 simply skips the first n rows and reads the rest. I do understand that once
 the full data set (40k rows) is read-in, I can manipulate the data. But the
 bottle-neck here is the first read/scan of data.

 I searched in the forum using key words (conditional skip/skip reading
 rows/skip data/conditional data read) etc. but couldn't find relevant
 conversations. I apologize if this has already been discussed since it does
 seem hard to imagine that nobody has come across this problem yet.

 Any suggestions/comments are welcome.
 Thanks,
 mnstn
 --
 View this message in context: 
 http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional read-in of data

2009-11-04 Thread mnstn

Hello Jim and Gabor,
Thanks for your inputs. The lines:

a-as.matrix(read.table(pipe(awk -f cut.awk Data.file)))
cut.awk{for(i = 1; i = NF; i=i+10) print $i,}

solved my problem. I know that 40k lines is not a large data set. I have
about 150 files each of which has 40k rows and in each file I wanted to
visualize (basically to ensure nothing odd is going on) how the data behaves
in each quarter of the data w/o making 150 figures/pdf files. In future as
my data size increases I will consider using relational databases.

Thanks again,
mnstn


Gabor Grothendieck wrote:
 
 1. You can pipe your data through gawk (or other scripting language)
 process as in:
 http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2129.html
 
 2. read.csv.sql in the sqldf package on CRAN will set up a database
 for you, read the file into the database automatically defining the
 layout of the table, extract a portion into R based on an sql
 statement that you provide and then destroy the database all in one
 statement.  It uses the sqlite database which is included in the
 RSQLite R package that it depends on so there is nothing to separately
 install.
 See ?read.csv.sql in the package and also see example 13 on the home page:
 http://sqldf.googlecode.com
 
 
 On Wed, Nov 4, 2009 at 12:07 AM, mnstn pavan.n...@gmail.com wrote:

 Hello All,
 I have a 40k rows long data set that is taking a lot of time to be
 read-in.
 Is there a way to skip reading even/odd numbered rows or read-in only
 rows
 that are multiples of, say, 10? This way I get the general trend of the
 data
 w/o actually reading the entire thing. The option 'skip' in read.table
 simply skips the first n rows and reads the rest. I do understand that
 once
 the full data set (40k rows) is read-in, I can manipulate the data. But
 the
 bottle-neck here is the first read/scan of data.

 I searched in the forum using key words (conditional skip/skip reading
 rows/skip data/conditional data read) etc. but couldn't find relevant
 conversations. I apologize if this has already been discussed since it
 does
 seem hard to imagine that nobody has come across this problem yet.

 Any suggestions/comments are welcome.
 Thanks,
 mnstn
 --
 View this message in context:
 http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26197793.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.