Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle
How it represents data internally is very important, depending on the real goal : http://en.wikipedia.org/wiki/Column-oriented_DBMS Gabor Grothendieck ggrothendi...@gmail.com wrote in message news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com... How it represents data internally

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Gabor Grothendieck
Its only important internally. Externally its undesirable that the user have to get involved in it. The idea of making software easy to write and use is to hide the implementation and focus on the problem. That is why we use high level languages, object orientation, etc. On Thu, Jan 28, 2010 at

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle
Are you claiming that SQL is that utopia? SQL is a row store. It cannot give the user the benefits of column store. For example, why does SQL take 113 seconds in the example in this thread : http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html but data.table takes 5 seconds to get the

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Gabor Grothendieck
I think one would only be concerned about such internals if one were primarily interested in performance; otherwise, one would be more interested in ease of specification and part of that ease is having it independent of implementation and separating implementation from specification activities.

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle
I'm talking about ease of use to. The first line of the Details section in ?[.data.table says : Builds on base R functionality to reduce 2 types of time : 1. programming time (easier to write, read, debug and maintain) 2. compute time Once again, I am merely saying that the

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Gabor Grothendieck
Regarding the explanation of where the time goes it might be parsing the statement or the development of the query plan. The SQL statement for the more complex query is obviously much longer and its generated query plan involves 95 lines of byte code vs 19 lines of generated code for the simpler

[R] RMySQL - Bulk loading data and creating FK links

2010-01-27 Thread Nathan S. Watson-Haigh
I have a table (contact) with several fields and it's PK is an auto increment field. I'm bulk loading data to this table from files which if successful will be about 3.5million rows (approx 16000 rows per file). However, I have a linking table (an_contact) to resolve a m:m relationship between

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-27 Thread Olga Lyashevska
Hi Nathan, I have a table (contact) with several fields and it's PK is an auto increment field. I'm bulk loading data to this table from files which if successful will be about 3.5million rows (approx 16000 rows per file). However, I have a linking table (an_contact) to resolve a m:m

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-27 Thread Gabor Grothendieck
On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How many columns, and of what type are the columns ? As Olga asked too, it would be useful to know more about what you're really trying to do. 3.5m rows is not actually that many rows, even for 32bit R.  Its depends

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-27 Thread Matthew Dowle
sqldf(select * from BOD order by Time desc limit 3) Exactly. SQL requires use of order by. It knows the order, but it isn't ordered. Thats not good, but might be fine, depending on what the real goal is. Gabor Grothendieck ggrothendi...@gmail.com wrote in message

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-27 Thread Gabor Grothendieck
How it represents data internally should not be important as long as you can do what you want. SQL is declarative so you just specify what you want rather than how to get it and invisibly to the user it automatically draws up a query plan and then uses that plan to get the result. On Wed, Jan