(Meta: Sorry for not replying in-thread -- I'm posting this from
another email account.)

> I believe you said:
> Article has_many Topics
> Article has_many Regions
>
> Is this correct? If it is then you might have something like this:
>
> ...
> article_facts
> * topic_id
> * region_id
> * hit (always 1)
>
> With this structure you'd be able to aggregate your facts by the
> various attributes in both topic and dimension. Is this starting to
> look like what you are trying to do or am I still missing something
> here?

I think you're still missing something. If I do the above, I'm stuck
with one of two alternatives:
1. I have to choose a "primary" topic and region for each article, and
discard the others, or
2. For each article, I have m*n article_facts, where m is the number
of regions and n is the number of topics.

The former leads to artificially low numbers (where an article is not
counted for all but one of its topics/regions). The latter leads to
artificially high numbers (where an article is n-tuply counted for
every region, and m-tuply counted for every topic).

<math-nerd>I suppose I could fudge the numbers a bit, by creating a
"weighting" column which 1/sqrt(m*n) (inverse of geometric mean, so
the over/undercounting would be right "on average", assuming the
number of regions and number of topics are independent variables), but
then I'd feel dirty.</math-nerd>

Perhaps I scared you by referencing an MSDN technet article. The other
article (http://www.dbmsmag.com/9808d05.html) was written by Ralph
Kimball. :)

Looking at the source code more, it looks like it's hard-coded to
recognize HierarchicalBridges, and there's no real support for
pluggable bridging. (Please correct me if I'm wrong.) If I decide to
follow the approach laid out in the above article, I'd have to modify
the source code to whatever Aggregate I'm using. Does this sound
correct? Feasible? Stupid?

I'd rather use ActiveWarehouse if I can, because it looks like there's
a lot of knowledge here I could benefit from, and a lot of boilerplate
I can save, but I suppose lacking a solution to the above, I'll just
start hard coding some aggregate tables. Not that I blame the authors
-- it's no fault of yours if none of your data has multivalued
dimensions. If there's another way I can granulate the data that gets
rid of them but still lets me slice/count the way I'd like, I'm open
to that, as well.
_______________________________________________
Activewarehouse-discuss mailing list
Activewarehouse-discuss@rubyforge.org
http://rubyforge.org/mailman/listinfo/activewarehouse-discuss

Reply via email to