(cc-ing Amir)

>> [sean] Is there a plan for purging old data from this one?

>[christian]Just to make expectations explicit:
>[christian] Since in a different part of this thread you are asking more
for
>[christian] expected growth bounds, I assume that the table can stay at
that size
>[christian] until discussion with Language about the way forward produced
concrete
>[christian] next steps, and you do not expect us to prune data right away.

So we are all on the same page, the table has a lot of data cause i18n team
was not aware logging was happening until we notify them of that fact. As
Amir mentioned, the bug that prompted the logging has been fixed. As Dario
said we definitely do not need that much data.
I confirmed last week that we only need to 2 weeks of data to analyze, the
data is just a short  "survey" of what our users have available when it
comes to fonts. So, yes, we could delete a bunch of the data and I believe
Amir was about to request us to do so.

Since I have no permits to create tables, could we create a temporary table
that holds the last two weeks of data? We could use that for our analysis
and get rid of the other table once the bugfix is in production and logging
has stopped.


>[sean] I'm interested in identifying the expected growth bounds rather
than limiting tables arbitrarily.

This is definitely an item on our court, we need to determine those bounds
and throttle when they are exceeded.
We do not have any throttling when it comes to record creation. We detect
the higher throughput of data but that's about it.

I have created a backlog item to this extent:
https://bugzilla.wikimedia.org/show_bug.cgi?id=67470












On Thu, Jul 3, 2014 at 11:02 AM, Christian Aistleitner <
[email protected]> wrote:

> Hi Sean,
>
> On Thu, Jul 03, 2014 at 12:21:34PM +1000, Sean Pringle wrote:
> > The following table is easily the largest in eventlogging and growing
> > fastest:
> >
> > 114G     UniversalLanguageSelector-tofu_7629564
>
> thanks for the heads up!
>
> We are aware of UniversalLanguageSelector-tofu producing too much data
> since 2014-06-25 ([1], [2]), and Nuria is on it.
>
> As I could not find a corresponding bug, I created one to track the
> issue at:
>   https://bugzilla.wikimedia.org/show_bug.cgi?id=67463
>
>
> > Is there a plan for purging old data from this one?
>
> Just to make expectations explicit:
> Since in a different part of this thread you are asking more for
> expected growth bounds, I assume that the table can stay at that size
> until discussion with Language about the way forward produced concrete
> next steps, and you do not expect us to prune data right away.
>
> > There is a duplicate table called UniversalLanguageSelecTor-tofu_7629564
> --
> > note the uppercase T -- with a single row. Is that needed?
>
> I noted that too when looking at the issue last week, but decided
> against calling it out, since it's just a single small table.
> I expect we see these artifacts from time to time. Do they get in the
> way somehow, or is it ok to just keep them around?
>
> Thanks,
> Christian
>
>
> [1] http://lists.wikimedia.org/pipermail/analytics/2014-June/002260.html
> [2] search for “tofu” on
>   http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140625.txt
>
>
>
> --
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>                            Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3     Email:  [email protected]
> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>                              Fax:            +43 7946 / 20 5 81
>                              Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to