+1 to Dario's mention of the many schemas that just capture production DB
stuff in a better way.

Re. growth: Old growth experiment schemas continue to be a great resource
for checking old work and sometimes even new hypotheses.  When Dario and
Kevin get around to us, I'll have a complete list of schemas that should
not be purged.

Re. storage parameters in the Schema, I agree with Ori, but I'd still like
to have them on the wiki somehow.  If we were a bunch of Wikipedia editors,
I'd suggest making a template for the talk page of a schema that captures
this metadata.  Given that a template would probably not be best and we'd
probably like to stick to JSON, maybe a subpage would be in order.

E.g.

   - Schema:Foo == data type JSON
   - Schema:Foo/restrictions == storage restrictions JSON (sampling,
   pruning, indexing, etc.)
   - Schema_talk:Foo == Discussion of Schema:Foo

Such a pattern would allow for changes to storage restrictions without
changing the rev_id of the schema page (data type).

-Aaron


On Thu, May 29, 2014 at 1:26 AM, Steven Walling <[email protected]>
wrote:

>
> On Wed, May 28, 2014 at 10:50 AM, Dan Andreescu <[email protected]>
> wrote:
>
>> I just announced this potential change in Scrum of Scrums and the Mobile
>> team said they also would like to keep old data, but not for all of their
>> schemas.  They're cleaning up their graphs and we should check with them
>> when we start deleting.
>
>
> Following up on this from the Growth perspective...
>
> My main question is what the rationale is. Is it to improve query
> performance on analytics dbs?
>
> I do know there are many older schemas for Growth-related experiments that
> are only really useful for historical analysis, which is kind of hard to
> reconstruct anyway. If there are sound technical reasons to chuck stuff
> from the relational dbs and retain it only in the raw JSON logs, then I'm
> potentially okay with helping figure out a list of schemas to retain and
> schemas to purge. Aaron, thoughts?
>
> --
> Steven Walling,
> Product Manager
> https://wikimediafoundation.org/
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to