avantgardnerio commented on issue #2581: URL: https://github.com/apache/arrow-rs/issues/2581#issuecomment-1227477075
I'd like to apologize for some of the wording in this issue. It was written in haste and much of it would be better rephrased as "X could be improved by Y". Also, I was not aware of [this document](https://github.com/spaceandtimelabs/arrow-ballista/blob/master/docs/source/user-guide/distributed/deployment/kubernetes.md), or more specifically: ``` kind: PersistentVolumeClaim metadata: name: data-pv-claim spec: storageClassName: manual accessModes: - ReadWriteOnce ``` I believe I now understand how others are deploying Ballista in k8s - using shared storage to allow any executor to access any data. This makes it possible to load balance requests to any executor and still get the same data - I was storing it locally on the executor node itself, and responding to queries with Flights with the address of the executor that processed the partition, as opposed to the service that points at all executors. All that being said, now that I understand the recommended way of deploying, I think we can still make this easier on folks (and bonus: work with the JDBC driver in it's present state) by allowing the option of proxying the flights through the scheduler, as this will provide the best "works out of the box" experience by only requiring one connection to one address and avoid any issues with NATs, routers, subnets, firewalls, VLANs, overzealous IT security, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
