Re: [Analytics] db1047 & one box to rule them all

Oliver Keyes Wed, 30 Apr 2014 10:21:52 -0700

Not quite there yet - just pointing to it as a potentially blocker to the
"let's move everything to Hadoop!" idea (which I fully support). If the
goal is to enable research using unified data, but the unified data is more
difficult to access than the non-unified data, we probably haven't moved
the needle enough to justify it. "A sane way to access this stuff from
Python and R" should probably be considered a pretty firm prerequisite,
because without that, the utility isn't tremendously increased.



On 30 April 2014 09:42, Toby Negrin <[email protected]> wrote:

> I think we'll put everything on Hadoop at some point but we're focusing on
> the page views now.
>
> Regarding the bug - if you're ready to use it I can see if Andrew can
> install the java package.
>
> -Toby
>
> On Apr 30, 2014, at 9:34 AM, Oliver Keyes <[email protected]> wrote:
>
>
>
>
> On 30 April 2014 06:59, Dan Andreescu <[email protected]> wrote:
>
>> This is awesome, thank you Sean
>>
>>>  *This is probably my bad, but I understood the goal to be having a
>>>> single db containing unified, core tablets. So, we'd have one db, with one
>>>> revision table, that'd have an extra column of "wiki" that denoted the
>>>> project the entry referred to. This would let us perform global queries
>>>> without the complex UNIONs mentioned above. Is this still the goal, or...?
>>>>
>>>
>>> No, that wasn't the goal. Sorry if there was miscommunication. The
>>> actual data will remain in separate wikis using regular replication.
>>>
>>> However, it's quite possible to create one or more unified databases
>>> with (for example) SQL VIEWs that union all tables from a set of
>>> pre-defined wikis, with 'wiki' columns, just as you describe. Same thing,
>>> really. We could even allow ad-hoc creation of unified views for whatever
>>> .dblist is appropriate for the project. I don't think anything need be
>>> ruled out yet -- that's the whole point of SQL, right? Slow, but flexible.
>>> :-)
>>>
>>>
>>> that would work, Oliver is right that creating views for core tables in
>>> pre-defined wikis (say, all wikipedias) would be valuable. Sean, how about
>>> we create a page on wikitech with requirements for these views and we take
>>> it from there?
>>>
>>
>> Union-ified views sound great here.  Let's see how they perform.  I bet
>> they'll be fine but if they're not, maybe we can throw them into Hadoop?
>>  Using the views to do the MySQL -> Hadoop replication would be so much
>> easier than going to each database individually.
>>
>> Totally down for that, but...
> https://bugzilla.wikimedia.org/show_bug.cgi?id=64262
>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] db1047 & one box to rule them all

Reply via email to