[GitHub] [incubator-superset] mistercrunch commented on issue #6041: POC: Vaex connector

GitHub Sun, 07 Oct 2018 13:52:51 -0700

For reference, someone wrote a `pandas` connector 
(https://github.com/apache/incubator-superset/pull/3492) in the past that we 
never merged. The main reason it wasn't merge is that it was a fair amount of 
code to manage coming from a non-committer, while the connector interface 
wasn't super well-defined and "settled" at that point. Evolving the interface 
would mean carrying the pandas connector along for the ride.


Also the problem of where to persist the dataframe. Since our web servers are 
stateless, the pandas dataframe needs to be brought up in memory from the 
network prior to performing aggregations / filters. With something like Arrow 
that becomes somewhat reasonable, but it feels like there should be a dedicated 
service (that resembles a database quite a bit) loading/caching/computing on 
those files.

[ Full content available at: 
https://github.com/apache/incubator-superset/pull/6041 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [incubator-superset] mistercrunch commented on issue #6041: POC: Vaex connector

Reply via email to