It is worth pointing that our base database infrastracture is 35 times larger. Our (non duplicate) base data set is 10-15 times larger, compressed. And that we serve 30 times the number of database queries than they do; with peaks at 10-20x the number of queries per second, per server, despite his hardware being twice as powerful than our newest hardware.
All that with around 6-7 people working in infrastructure (vs 11 of us). This doesn't have anything to do with the original post. I just wanted to a) agree with Dan that we need better analytics infrastructure (Re: Something to aspire to, perhaps collaborate with them on.) and b) explain why this hasn't been done already and why it is complex. But it is a known request both from analytics, research and other labs users. Sources: <http://stackexchange.com/performance> <https://wikimediafoundation.org/wiki/Staff_and_contractors> <https://www.mediawiki.org/wiki/File:MySQL_at_Wikipedia.pdf> <http://stackexchange.com/about/team#Engineering> On Sat, Nov 14, 2015 at 2:18 AM, Dan Andreescu <[email protected]> wrote: > For anyone else interested: Nemo was able to answer this question because >> StackExchange has a Quarry <http://quarry.wmflabs.org/>-like public >> query interface of their own. You should go play with it right now: >> http://data.stackexchange.com/ >> > > It's worth pointing out one major difference between their Quarry-like > thing and our Quarry. I love both, btw. But our Quarry suffers because > the only public database we have is raw and has a schema meant for OLTP. > StackExchange's is clearly hitting a well organized OLAP style schema. > Something to aspire to, perhaps collaborate with them on. > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Jaime Crespo <http://wikimedia.org>
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
