cc-ing leila as we were experimenting with these some weeks back in SF, I
think they can be killed w/o problems. I did not know they were still
running, we run a faster version of those queries and got the data we were
interested in a while back.

On Mon, Nov 10, 2014 at 1:55 AM, Sean Pringle <[email protected]>
wrote:

> Three identical queries from the 'research_prod' user have just passed one
> month execution time on s1-anlytics-slave:
>
> select count(*)
> from staging.ourvision r
> where exists (
>   select *
>   from staging.ourvision r1
>           inner join
>        staging.ourvision r2
>           on r2.sha1 = r1.sha1
>     where r1.page_id = r.page_id
>       and r2.page_id = r.page_id
>       and DATE_ADD(r.timestamp, INTERVAL 1 HOUR)
>       and r2.timestamp between r.timestamp and DATE_SUB(r.timestamp ,
> INTERVAL 1 HOUR)
>       and  r1.sha1!= r.sha1
> );
>
> I havn't checked to see if the queries are just that amazingly slow, or if
> they're part of a larger ongoing transaction. In any case, three month-long
> transactions is pushing the resource limits of the slave and will soon
> result in either mass replication lag or some other interesting lockup that
> may in turn take days to rollback :-)
>
> Can we kill these? Can we optimize and/or redesign the jobs? Happy to
> help...
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to