Re: improving access to telemetry data(Help Wanted)

selenamarie Thu, 28 Feb 2013 11:50:32 -0800

On Thursday, February 28, 2013 9:57:04 AM UTC-8, Taras Glek wrote:
> Justin Lebar wrote:
> 
> > It sounds to me like people want both
> 
> >
> 
> > 1) Easier access to aggregated data so they can build their own
> 
> > dashboards roughly comparable in features to the current dashboards.
> 
> 
> 
> I doubt people actually want to build own dashboards. I suspect this is 
> 
> mainly a need because of deficiencies in the current dashboard.
> 
> 
> 
> >
> 
> > 2) Easier access to raw databases so that people can build up more
> 
> > complex analyses, either by exporting the raw data from the db, or by
> 
> > analyzing it in the db.
> 
> >
> 
> > That is, I don't think we can or should export JSON with all the data
> 
> > in our databases.  That is a lot of data.
> 
> 
> 
>  From concrete examples I've seen so far, people want basic 
> 
> aggregations. My FE in http://people.mozilla.org/~tglek/dashboard/ works 
> 
> on aggregated histogram JSONs. It seems completely reasonable to 
> 
> aggregate all of the other info + simple_measurement fields(and is on my 
> 
> TODO). This would solve all of the other concrete use-cases mentioned 
> 
> (flash versions, hardware stats)
> 
> 
> 
> I think we can be more aggressive still. We can also allow filtering 
> 
> certain histograms by one of those highly variable info fields(eg TAB 
> 
> animations vs gfx hardware, specific chromehangs vs something useful, 
> 
> etc) without unreasonable overhead overhead.
> 
> 
> 
> I like my aggregated JSON approach because it's cheap on server CPU and 
> 
> as long as one partitions JSON carefully, it can be compact-enough for 
> 
> gzip encoding to make it fast-enough to download. This should also make 
> 
> it easy to fork the dashboards, contribute, etc.
> 
> 
> 
> I hope to feed more data into my frontend by end of today and will aim 
> 
> for a live-ish dashboard by end of next week.
> 
> 
> 
> For advanced use-cases, we can stick with hadoop querying.
> 
> 
> 
> ==Help wanted==
> 
> 
> 
> If anyone knows a dev who is equally good at stats & programming, let me 
> 
> know. I think we have a lot of useful data, we can handle some 
> 
> visualizations of that data, but a person skilled at extracting signal 
> 
> out of noisy sources could help us squeeze the most use out of our data.


I'm pretty interested in this problem. I won't be so bold to say that I am 
"skilled" in this area, but I have been successful in finding interesting 
things in some noisy data sets. 

So, I'm putting my hand up, and I'll see what I can do over the next few days 
to hack around at it.

If others are interested in collaboration, please just ping me. :) I'm on 
Laura's team, working primarily on Socorro.

> If someone wants to help with aggregations, I can hook you up with raw 
> json dumps from hadoop. 

I'm also interested in this, and probably more qualified to do this in the 
short term, anyway. :)

Is there a wishlist? 

-selena

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: improving access to telemetry data(Help Wanted)

Reply via email to