Justin Lebar wrote:
It sounds to me like people want both

1) Easier access to aggregated data so they can build their own
dashboards roughly comparable in features to the current dashboards.

I doubt people actually want to build own dashboards. I suspect this is mainly a need because of deficiencies in the current dashboard.


2) Easier access to raw databases so that people can build up more
complex analyses, either by exporting the raw data from the db, or by
analyzing it in the db.

That is, I don't think we can or should export JSON with all the data
in our databases.  That is a lot of data.

From concrete examples I've seen so far, people want basic aggregations. My FE in http://people.mozilla.org/~tglek/dashboard/ works on aggregated histogram JSONs. It seems completely reasonable to aggregate all of the other info + simple_measurement fields(and is on my TODO). This would solve all of the other concrete use-cases mentioned (flash versions, hardware stats)

I think we can be more aggressive still. We can also allow filtering certain histograms by one of those highly variable info fields(eg TAB animations vs gfx hardware, specific chromehangs vs something useful, etc) without unreasonable overhead overhead.

I like my aggregated JSON approach because it's cheap on server CPU and as long as one partitions JSON carefully, it can be compact-enough for gzip encoding to make it fast-enough to download. This should also make it easy to fork the dashboards, contribute, etc.

I hope to feed more data into my frontend by end of today and will aim for a live-ish dashboard by end of next week.

For advanced use-cases, we can stick with hadoop querying.

==Help wanted==

If anyone knows a dev who is equally good at stats & programming, let me know. I think we have a lot of useful data, we can handle some visualizations of that data, but a person skilled at extracting signal out of noisy sources could help us squeeze the most use out of our data.


I spend too much time on management to make quick progress. I wrote up the prototype to prove to myself that the json schema is feasible.

If someone wants to help with aggregations, I can hook you up with raw json dumps from hadoop. For everything else, the code is on github(https://github.com/tarasglek/telemetry-frontend). Help wanted: UX improvements such as easier-to-use selectors, incremental search, switching to superior charting such as flotcharts.org


On Thu, Feb 28, 2013 at 12:08 PM, Benjamin Smedberg
<benja...@smedbergs.us>  wrote:
On 2/28/2013 10:59 AM, Benoit Jacob wrote:
Because the raw crash files do not include new metadata fields, this has
led to weird engineering practices like shoving interesting metadata into
the freeform app notes field, and then parsing that data back out later.
I'm worried about perpetuating this kind of behavior, which is hard on
the
database and leads to very arcane queries in many cases.

I don't agree with the notion that freeform fields are bad. freeform plain
text is an amazing file format. It allows to add any kind of data without
administrative overhead and is still easy to parse (if the data was that
was added was formatted with easy parsing in mind).
The obvious disadvantage is that it is much more difficult to
machine-process. For example elasticsearch can't index on it (at least not
without lots of custom parsing), and in general you can't ask tools like
hbase or elasticsearch to filter on that without a user defined function.
(Regexes might work for some kinds of text processing.)

But if one considers it a bad thing that people use it, then one should
address the issues that are causing people to use it. As you mention, raw
crash files may not include newer metadata fields. So maybe that can be
fixed by making it easier or even automatable to include new fields in raw
crash files?
Yes, that is all filed. We can't automatically include the field, because we
don't know whether they are supposed to be public or private, but we should
soon be able to have a dynamically updateable list.

Note that if mcmanus is correct, we're going to be dealing with 1M fields
per day here. That's a lot more than the 250k from crash-stats, especially
because the payload is bigger. I believe that the flat files from
crash-stats are a really useful kludge because we couldn't figure out a
better way to expose the raw data. But that kludge will start to fall over
pretty quickly, and perhaps we should just expose a better way to do queries
using the databases, which are surprisingly good at doing these kinds of
queries efficiently.


--BDS

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to