Not quite there yet - just pointing to it as a potentially blocker to the "let's move everything to Hadoop!" idea (which I fully support). If the goal is to enable research using unified data, but the unified data is more difficult to access than the non-unified data, we probably haven't moved the needle enough to justify it. "A sane way to access this stuff from Python and R" should probably be considered a pretty firm prerequisite, because without that, the utility isn't tremendously increased.
On 30 April 2014 09:42, Toby Negrin <[email protected]> wrote: > I think we'll put everything on Hadoop at some point but we're focusing on > the page views now. > > Regarding the bug - if you're ready to use it I can see if Andrew can > install the java package. > > -Toby > > On Apr 30, 2014, at 9:34 AM, Oliver Keyes <[email protected]> wrote: > > > > > On 30 April 2014 06:59, Dan Andreescu <[email protected]> wrote: > >> This is awesome, thank you Sean >> >>> *This is probably my bad, but I understood the goal to be having a >>>> single db containing unified, core tablets. So, we'd have one db, with one >>>> revision table, that'd have an extra column of "wiki" that denoted the >>>> project the entry referred to. This would let us perform global queries >>>> without the complex UNIONs mentioned above. Is this still the goal, or...? >>>> >>> >>> No, that wasn't the goal. Sorry if there was miscommunication. The >>> actual data will remain in separate wikis using regular replication. >>> >>> However, it's quite possible to create one or more unified databases >>> with (for example) SQL VIEWs that union all tables from a set of >>> pre-defined wikis, with 'wiki' columns, just as you describe. Same thing, >>> really. We could even allow ad-hoc creation of unified views for whatever >>> .dblist is appropriate for the project. I don't think anything need be >>> ruled out yet -- that's the whole point of SQL, right? Slow, but flexible. >>> :-) >>> >>> >>> that would work, Oliver is right that creating views for core tables in >>> pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about >>> we create a page on wikitech with requirements for these views and we take >>> it from there? >>> >> >> Union-ified views sound great here. Let's see how they perform. I bet >> they'll be fine but if they're not, maybe we can throw them into Hadoop? >> Using the views to do the MySQL -> Hadoop replication would be so much >> easier than going to each database individually. >> >> Totally down for that, but... > https://bugzilla.wikimedia.org/show_bug.cgi?id=64262 > >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
