cc-ing leila as we were experimenting with these some weeks back in SF, I think they can be killed w/o problems. I did not know they were still running, we run a faster version of those queries and got the data we were interested in a while back.
On Mon, Nov 10, 2014 at 1:55 AM, Sean Pringle <[email protected]> wrote: > Three identical queries from the 'research_prod' user have just passed one > month execution time on s1-anlytics-slave: > > select count(*) > from staging.ourvision r > where exists ( > select * > from staging.ourvision r1 > inner join > staging.ourvision r2 > on r2.sha1 = r1.sha1 > where r1.page_id = r.page_id > and r2.page_id = r.page_id > and DATE_ADD(r.timestamp, INTERVAL 1 HOUR) > and r2.timestamp between r.timestamp and DATE_SUB(r.timestamp , > INTERVAL 1 HOUR) > and r1.sha1!= r.sha1 > ); > > I havn't checked to see if the queries are just that amazingly slow, or if > they're part of a larger ongoing transaction. In any case, three month-long > transactions is pushing the resource limits of the slave and will soon > result in either mass replication lag or some other interesting lockup that > may in turn take days to rollback :-) > > Can we kill these? Can we optimize and/or redesign the jobs? Happy to > help... > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
