This is awesome, thank you Sean

> *This is probably my bad, but I understood the goal to be having a single
>> db containing unified, core tablets. So, we'd have one db, with one
>> revision table, that'd have an extra column of "wiki" that denoted the
>> project the entry referred to. This would let us perform global queries
>> without the complex UNIONs mentioned above. Is this still the goal, or...?
>>
>
> No, that wasn't the goal. Sorry if there was miscommunication. The actual
> data will remain in separate wikis using regular replication.
>
> However, it's quite possible to create one or more unified databases with
> (for example) SQL VIEWs that union all tables from a set of pre-defined
> wikis, with 'wiki' columns, just as you describe. Same thing, really. We
> could even allow ad-hoc creation of unified views for whatever .dblist is
> appropriate for the project. I don't think anything need be ruled out yet
> -- that's the whole point of SQL, right? Slow, but flexible. :-)
>
>
> that would work, Oliver is right that creating views for core tables in
> pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about
> we create a page on wikitech with requirements for these views and we take
> it from there?
>

Union-ified views sound great here.  Let's see how they perform.  I bet
they'll be fine but if they're not, maybe we can throw them into Hadoop?
 Using the views to do the MySQL -> Hadoop replication would be so much
easier than going to each database individually.
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to