Hi Maxime,

> Evolving the interface would mean carrying the pandas connector along for the 
> ride.

would you say it is more stable, or stable?


> Also the problem of where to persist the dataframe.

Vaex is using mostly memory mapped hdf5 files, relying on the OS cache, meaning 
there are practically 0 costs if a process gets restarted (apart from Python 
startup/import cost). It doesn't matter if you open a 100MB or 2TB file, it is 
practically for free

I always planned to add Arrow support, but I never saw the same performance, so 
didn't bother yet. However, for compatibility, it would be great to have.

There is also a (stateless) vaex server for accessing remote datasets and build 
on top of that a distributed part (although less mature) which is almost 
trivial since almost everything vaex computes is embarrassingly parallel.

I'd like to know if a connector for vaex is potentially interesting. I don't 
know much about Druid's performance, so I don't know how fast it could produce 
the data for these visualizations I produced. If it would be an order of 
magnitude, it could be interesting, otherwise probably not worth the effort, 
maybe you have some statistics on this.

Also, to really make superset work well with vaex, it probably needs some 
custom viz as well, like something I demonstrate here:
https://youtu.be/bP-JBbjwLM8?t=1125 (the 150 million taxi dataset)
https://youtu.be/bP-JBbjwLM8?t=1741 (1 bilion stars)
Which is similar to the Heatmap viz, except here it will use numerical data 
with limits/bounds, and adds the ability to zoom/pan (and select/filter).
Again, here I don't know if this is potentially interesting for superset.

Regards,

Maarten


[ Full content available at: 
https://github.com/apache/incubator-superset/pull/6041 ]
This message was relayed via gitbox.apache.org for devnull@infra.apache.org

Reply via email to