Hi again folks,

The operation has successfully finished.

The `wmf.webrequest` hive table now contains data coming from HAProxy.
We still have to rerun downstream jobs (pageviews mostly) for the today's
first hours so that we have a clean cut on data at April 1st. 6y.
This will be done in the next hours but shouldn't disrupt your work.

*An important thing to note* is that for cluster space reasons we only have
data since mid-March in the `wmf.webrequest` table.
It'll build up to the 90 days retention in the next months.
If you need older data, you can use the `wmf_deprecated.webrequest` hive
table that contains the 'old' Varnish data we keep in case something goes
wrong with HAProxy.
We will keep this table for at least a month.

Of course if you spot something odd or if you have questions, please let us
know :)

On Tue, Apr 1, 2025 at 10:25 AM Joseph Allemandou <[email protected]>
wrote:

> Good morning data folks,
> This morning we migrate our webrequest dataset to feed from HAProxy
> instead of Varnish.
> The expected differences are documented in this google doc
> <https://docs.google.com/document/d/1cCSGzLUfVWUHjqG5v5VdLADsbzmMklczQ1YG7oghGl8/edit?tab=t.0#heading=h.501d0uw4oyze>,
> as well as the detailed analysis and the migration plan.
> We expect to be done in a few hours, in the meantime you may experience
> failing queries if you use the webrequest data, and new data will not be
> flowing until we are done.
> For those interested in following the operation, we'll post on slack in this
> thread
> <https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1743494545802669>.
> We'll post here again at the end of the operation.
> Thank you for your understanding :)
>
> --
> Joseph Allemandou (joal) (he / him)
> Staff Data Engineer
> Wikimedia Foundation
>


-- 
Joseph Allemandou (joal) (he / him)
Staff Data Engineer
Wikimedia Foundation
_______________________________________________
Analytics mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to