On Thursday, February 28, 2013 9:57:04 AM UTC-8, Taras Glek wrote: > Justin Lebar wrote: > > > It sounds to me like people want both > > > > > > 1) Easier access to aggregated data so they can build their own > > > dashboards roughly comparable in features to the current dashboards. > > > > I doubt people actually want to build own dashboards. I suspect this is > > mainly a need because of deficiencies in the current dashboard. > > > > > > > > 2) Easier access to raw databases so that people can build up more > > > complex analyses, either by exporting the raw data from the db, or by > > > analyzing it in the db. > > > > > > That is, I don't think we can or should export JSON with all the data > > > in our databases. That is a lot of data. > > > > From concrete examples I've seen so far, people want basic > > aggregations. My FE in http://people.mozilla.org/~tglek/dashboard/ works > > on aggregated histogram JSONs. It seems completely reasonable to > > aggregate all of the other info + simple_measurement fields(and is on my > > TODO). This would solve all of the other concrete use-cases mentioned > > (flash versions, hardware stats) > > > > I think we can be more aggressive still. We can also allow filtering > > certain histograms by one of those highly variable info fields(eg TAB > > animations vs gfx hardware, specific chromehangs vs something useful, > > etc) without unreasonable overhead overhead. > > > > I like my aggregated JSON approach because it's cheap on server CPU and > > as long as one partitions JSON carefully, it can be compact-enough for > > gzip encoding to make it fast-enough to download. This should also make > > it easy to fork the dashboards, contribute, etc. > > > > I hope to feed more data into my frontend by end of today and will aim > > for a live-ish dashboard by end of next week. > > > > For advanced use-cases, we can stick with hadoop querying. > > > > ==Help wanted== > > > > If anyone knows a dev who is equally good at stats & programming, let me > > know. I think we have a lot of useful data, we can handle some > > visualizations of that data, but a person skilled at extracting signal > > out of noisy sources could help us squeeze the most use out of our data.
I'm pretty interested in this problem. I won't be so bold to say that I am "skilled" in this area, but I have been successful in finding interesting things in some noisy data sets. So, I'm putting my hand up, and I'll see what I can do over the next few days to hack around at it. If others are interested in collaboration, please just ping me. :) I'm on Laura's team, working primarily on Socorro. > If someone wants to help with aggregations, I can hook you up with raw > json dumps from hadoop. I'm also interested in this, and probably more qualified to do this in the short term, anyway. :) Is there a wishlist? -selena _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform