Hey folks,

I'm currently deciding whether to build a decent sized (around 300-500GB, although honestly, I've got little to base that on at the moment) data warehouse in postgreSQL or MySQL. I've developed several in MS SQL and postgreSQL, but the client is comfortable with MySQL, and I'd prefer to use that as the platform since it will be less painful for them to manage when I'm gone. I'm hoping that someone with experience building a warehouse on MySQL will be able to answer two outstanding questions I have:

1) Several sources seem to suggest MyISAM is a good choice for data warehousing, but due to my lack of experience in a transaction-less world, this makes me a little nervous. How do you handle data inconsistency problems when ETL jobs fail? (For the record, I don't use a separate tool for the ETL; I usually use perl/shell scripts to interact with the file system, and pl/pgsql or transact-sql once the data is loaded into the staging database. For each file that is loaded, I'll identify steps that must be posted together, and wrap them in a transaction in the ETL job.) I can see doing something like manually cleaning out the necessary tables before you re-run, but that seems a bit messy to me. Anyone figure out a better approach?

2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs in the fact tables will be low cardinality columns; queries that didn't use date would be very slow on large fact tables (MS SQL had this problem). Has anyone run into this with MySQL?

Many thanks in advance!

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to