At 6:59 AM +0530 12/18/07, Yuvaraj Athur Raghuvir wrote:
Thanks for the interesting discussion. What I got so far is summarized below: 1) Row based versus Column based storage is an implementation detail. 2) SQL used for access is independent of storage mechanism adopted. 3) Row based storage with indices on all columns reaches read performance of column based storage. 4) Creating/updating indices fast using new algorithms is a direction of improvement for SQLite
The main way that this difference is an implementation detail is in the sense that your database schema and the DBMS API can be used unchanged with both. However, the 2 have different performance characteristics, which is why one would pick one over the other.
If a DBMS is smart enough, it can automatically pick the best storage method for performance and you don't have to think about it.
However, many DBMS are not that smart and so typically users find themselves making explicit changes to their schemas, specifying the storage method explicitly, in order to compensate and/or give the DBMS hints. In these typical situations, what should be an implementation detail is something that can have a lot of impact on your schema design.
Now, if the storage is an implementation detail, can the following scenario be realized? a) Given: Distributed highly-available system which is implemented as maintaining replicas of data b) The replicas of data have different storage mechanisms which is also recorded in the (distributed) database coordinator. c) This would, in essence, be a hybrid database - hybrid in the sense of using different data storage strategies (row-based / column-based) in the replicas. This would allow for the database coordinator to intelligently respond to the various operations on the database by redirecting the original request to the appropriate replica. The cost would be when the data changes and each of the replicas have to be brought into sync. Here again, the intelligence should be such that the storage schema that achieves the best performance for that SQL statement should be used and the sync can happen in the back ground. My perspective is that progressively, the data storage (implementation) strategies will pay an important role given that OLTP/OLAP requirements are getting blurred.
That could all be made to work, but I don't know if anyone actually has implemented this yet ... or maybe that was your intention.
-- Darren Duncan ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------