I strongly recommend Google's visualization API. This is divided into two parts, the reporting half and the data source half. The reporting half is pretty good and very easy to use from javascript. It is the library that underlies pretty much all of Google's internal and external web visualizations.
The data source half might actually be of more use for Mahout. It provides a simplified query language, query parsers standard provisions for having data sources that handle only a subset of the possible query language, and shims that help provide the remaining bits of query semantics. The great virtue of this layer is that it provides a very clean abstraction layer that separates data and presentation. That separate lets you be very exploratory at the visualization layer while reconstructing the data layer as desired for performance. Together these layers make it quite plausible to handle millions of data points by the very common strategy of handling lots of data at the data layer, but only transporting modest amounts of summary data to the presentation layer. The data layer is also general enough that you could almost certainly use it with alternative visualization layers. For instance, you can specify that data be returned in CSV format which would make R usable for visualization. Or JSON makes Googles visualization code easy to use. JSON would also make processing or processing/js quite usable. I have ported the java version of the data source stuff to use Maven in a standardized build directory and have added a version of the mysql support code to allow integration with standard web service frameworks. That can be found on github here: https://github.com/tdunning/visualization-data-source The original Google site on the subject is here: http://code.google.com/apis/chart/ http://code.google.com/apis/chart/interactive/docs/dev/dsl_about.html On Sat, Sep 17, 2011 at 1:23 PM, Grant Ingersoll <gsing...@apache.org>wrote: > I'll be checking in an abstraction, people can implement writers as they > see fit. > > FWIW, I'm mostly looking for something that can be used in a vizualization > toolkit, such as Gephi (although all be impressed if any of them can handle > 7M points) > > -Grant > > On Sep 16, 2011, at 7:14 PM, Ted Dunning wrote: > > > Indeed. > > > > I strongly prefer the other two for expressivity. > > > > On Fri, Sep 16, 2011 at 4:37 PM, Jake Mannix <jake.man...@gmail.com> > wrote: > > > >> On Fri, Sep 16, 2011 at 3:30 PM, Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >> > >>> I think that Avro and protobufs are the current best options for large > >> data > >>> assets like this. > >>> > >> > >> (or serialized Thrift) > >> > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > Lucene Eurocon 2011: http://www.lucene-eurocon.com > >