[Vaex]() is a (lazy) out-of-core DataFrame library for Python that is used to visualize and explore big tabular data at ~ a billion rows per second (on a decent computer/laptop). The visualization part of vaex is similar to datashader (see https://github.com/apache/incubator-superset/issues/4492), but vaex is more general.
Vaex focusses strongly on binned statistics on N-d grids, and instead of the groupby, uses binby which can be used for instance to create 1d histograms: ```python x_counts = ds.count(binby=ds.x, limits=[-10, 10], shape=64) ``` Or a 2d array with means of a column: ```python z_mean_map = ds.mean(ds.z, binby=[ds.x, ds.y], limits=[[-10, 10], [-20, 20]], shape=(64, 128)) ``` I thought it would be interesting to see if I could integrate this in superset, hence this PR, which is only a proof of concept. I managed to get some visualizations up using the New York Taxi dataset: https://docs.vaex.io/en/latest/datasets.html which contains over 1 billion rows (although for this test I only used the 2015 data, which contains ~150 million rows). I got the table view working: <img width="1202" alt="screen shot 2018-10-03 at 20 51 05" src="https://user-images.githubusercontent.com/1765949/46520723-50fc9d80-c87d-11e8-943b-29ee5ca54a65.png"> Pie charts: <img width="1854" alt="screen shot 2018-10-03 at 21 04 43" src="https://user-images.githubusercontent.com/1765949/46520800-a20c9180-c87d-11e8-8fd2-28650505baa1.png"> And time series: <img width="1845" alt="screen shot 2018-10-04 at 20 47 10" src="https://user-images.githubusercontent.com/1765949/46520822-b18bda80-c87d-11e8-8bdd-1625b2d8a4dc.png"> And I think the most beautiful one is the heatmap: <img width="1000" alt="screen shot 2018-10-03 at 21 38 41" src="https://user-images.githubusercontent.com/1765949/46520859-d41df380-c87d-11e8-8307-437dae3af7a9.png"> Note that the data to produce these viz just takes a fraction of a second for these ~150 million rows, 1 ~1 billion rows per second is about the expected performance (per computer). I just put this out here so judge interest in this, and as an additional example to https://github.com/apache/incubator-superset/pull/3492 [ Full content available at: https://github.com/apache/incubator-superset/pull/6041 ] This message was relayed via gitbox.apache.org for [email protected]
