Hi Maxime, > Evolving the interface would mean carrying the pandas connector along for the > ride.
would you say it is more stable, or stable? > Also the problem of where to persist the dataframe. Vaex is using mostly memory mapped hdf5 files, relying on the OS cache, meaning there are practically 0 costs if a process gets restarted (apart from Python startup/import cost). It doesn't matter if you open a 100MB or 2TB file, it is practically for free I always planned to add Arrow support, but I never saw the same performance, so didn't bother yet. However, for compatibility, it would be great to have. There is also a (stateless) vaex server for accessing remote datasets and build on top of that a distributed part (although less mature) which is almost trivial since almost everything vaex computes is embarrassingly parallel. I'd like to know if a connector for vaex is potentially interesting. I don't know much about Druid's performance, so I don't know how fast it could produce the data for these visualizations I produced. If it would be an order of magnitude, it could be interesting, otherwise probably not worth the effort, maybe you have some statistics on this. Also, to really make superset work well with vaex, it probably needs some custom viz as well, like something I demonstrate here: https://youtu.be/bP-JBbjwLM8?t=1125 (the 150 million taxi dataset) https://youtu.be/bP-JBbjwLM8?t=1741 (1 bilion stars) Which is similar to the Heatmap viz, except here it will use numerical data with limits/bounds, and adds the ability to zoom/pan (and select/filter). Again, here I don't know if this is potentially interesting for superset. Regards, Maarten [ Full content available at: https://github.com/apache/incubator-superset/pull/6041 ] This message was relayed via gitbox.apache.org for devnull@infra.apache.org