Hi Andrew, nice to see you here :) I have no experience with sharing loads across servers as far as AW/AW-ETL is concerned (although I thought about splitting stuff a-la map/reduce on the ETL part), and didn't have to rollup stuff either, so I won't be of much help on that point. Maybe someone else will be able to provide some insights.
A question that popped after reading your post is: are you concerned about performance on the data building stage, or on the reporting stage (or both) ? My own insight (from what I've learned at least) would be to try prototyping on a single machine and avoid optimization until you can't delay it anymore. I've been positively surprised by how fast things can go as long as you only import what you really need, with the proper indices. In my cases, I actually dump the whole production database each night (hundreds of MB), and it is then copied to a dedicated machine that does the ETL part, add dimensions (including home-cooked date dimension with what's relevant, Q1, Q2, day of week, whatever), create views. I thought I would switch to incremental stuff but the need never really came up. I guess in your case, I would dump the whole AR table, add a date dimension with what's relevant, see how it goes, then put this stuff on a separate machine nightly or periodically depending on the freshness you need. A few things than could help: - I've been setting up a continuous integration server with the latest data and it helped a lot to make development easier - you'll want to have a look at the "screen" feature, basically tests that you can run against the data (mine are launched once the data is loaded in the database) - http://activewarehouse.rubyforge.org/docs/activewarehouse-etl.html => etl manual Feel free to ask questions as they go here! cheers -- Thibaut
_______________________________________________ Activewarehouse-discuss mailing list Activewarehouse-discuss@rubyforge.org http://rubyforge.org/mailman/listinfo/activewarehouse-discuss