This is awesome, thank you Sean > *This is probably my bad, but I understood the goal to be having a single >> db containing unified, core tablets. So, we'd have one db, with one >> revision table, that'd have an extra column of "wiki" that denoted the >> project the entry referred to. This would let us perform global queries >> without the complex UNIONs mentioned above. Is this still the goal, or...? >> > > No, that wasn't the goal. Sorry if there was miscommunication. The actual > data will remain in separate wikis using regular replication. > > However, it's quite possible to create one or more unified databases with > (for example) SQL VIEWs that union all tables from a set of pre-defined > wikis, with 'wiki' columns, just as you describe. Same thing, really. We > could even allow ad-hoc creation of unified views for whatever .dblist is > appropriate for the project. I don't think anything need be ruled out yet > -- that's the whole point of SQL, right? Slow, but flexible. :-) > > > that would work, Oliver is right that creating views for core tables in > pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about > we create a page on wikitech with requirements for these views and we take > it from there? >
Union-ified views sound great here. Let's see how they perform. I bet they'll be fine but if they're not, maybe we can throw them into Hadoop? Using the views to do the MySQL -> Hadoop replication would be so much easier than going to each database individually.
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
