siddharthteotia opened a new issue #6921: URL: https://github.com/apache/incubator-pinot/issues/6921
In addition to being used as the in-memory and wire columnar format in few compute engines, Arrow is also commonly used for data sharing between JVM and non JVM systems without SerDe overhead. So python users working with Pandas and other analytical libraries can consume arrow in-memory format generated by JVM based engine. See this example on how PySpark uses Arrow - https://kontext.tech/column/spark/370/improve-pyspark-performance-using-pandas-udf-with-apache-arrow Arrow flight is the optimized wire protocol for network transfer of columnar record batches (think of as alternative to JDBC and ODBC protocol). The wire format is same as in-memory format. So when both endpoints are using Arrow, Flight protocol can be used to efficiently send result data from Pinot as Arrow record batches to say a Python client which can continue to do additional processing on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
