Hi Andrew,
nice to see you here :)

I have no experience with sharing loads across servers as far as AW/AW-ETL
is concerned (although I thought about splitting stuff a-la map/reduce on
the ETL part), and didn't have to rollup stuff either, so I won't be of much
help on that point. Maybe someone else will be able to provide some
insights.

A question that popped after reading your post is: are you concerned about
performance on the data building stage, or on the reporting stage (or both)
?

My own insight (from what I've learned at least) would be to try prototyping
on a single machine and avoid optimization until you can't delay it anymore.
I've been positively surprised by how fast things can go as long as you only
import what you really need, with the proper indices.

In my cases, I actually dump the whole production database each night
(hundreds of MB), and it is then copied to a dedicated machine that does the
ETL part, add dimensions (including home-cooked date dimension with what's
relevant, Q1, Q2, day of week, whatever), create views. I thought I would
switch to incremental stuff but the need never really came up.

I guess in your case, I would dump the whole AR table, add a date dimension
with what's relevant, see how it goes, then put this stuff on a separate
machine nightly or periodically depending on the freshness you need.

A few things than could help:
- I've been setting up a continuous integration server with the latest data
and it helped a lot to make development easier
- you'll want to have a look at the "screen" feature, basically tests that
you can run against the data (mine are launched once the data is loaded in
the database)
- http://activewarehouse.rubyforge.org/docs/activewarehouse-etl.html => etl
manual

Feel free to ask questions as they go here!

cheers

-- Thibaut
_______________________________________________
Activewarehouse-discuss mailing list
Activewarehouse-discuss@rubyforge.org
http://rubyforge.org/mailman/listinfo/activewarehouse-discuss

Reply via email to