Hey folks,
I'm currently deciding whether to build a decent sized (around
300-500GB, although honestly, I've got little to base that on at the
moment) data warehouse in postgreSQL or MySQL. I've developed several
in MS SQL and postgreSQL, but the client is comfortable with MySQL, and
I'd prefer to use that as the platform since it will be less painful for
them to manage when I'm gone. I'm hoping that someone with experience
building a warehouse on MySQL will be able to answer two outstanding
questions I have:
1) Several sources seem to suggest MyISAM is a good choice for data
warehousing, but due to my lack of experience in a transaction-less
world, this makes me a little nervous. How do you handle data
inconsistency problems when ETL jobs fail? (For the record, I don't use
a separate tool for the ETL; I usually use perl/shell scripts to
interact with the file system, and pl/pgsql or transact-sql once the
data is loaded into the staging database. For each file that is loaded,
I'll identify steps that must be posted together, and wrap them in a
transaction in the ETL job.) I can see doing something like manually
cleaning out the necessary tables before you re-run, but that seems a
bit messy to me. Anyone figure out a better approach?
2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs
in the fact tables will be low cardinality columns; queries that didn't
use date would be very slow on large fact tables (MS SQL had this
problem). Has anyone run into this with MySQL?
Many thanks in advance!
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]