GitHub user Yicong-Huang created a discussion: Use Polars to avoid data conversion from arrow to pandas
Currently our Tuple and Table in Python are implemented as pandas series/datafram, respectively. And we need to convert from arrow data to pandas, for processing UDF, then convert the results back to arrow to send back to jvm. [Polars](https://github.com/pola-rs/polars) is a data frame API support arrow as the data format (columnar). I think we could switch to Polars data frame (possibly as a drop in replace of pandas data frame), for the two benefits: 1. to avoid the two extra data conversions. 2. to reduce processing speed, as Polars is written in Rust and benchmark shows its processing speed is faster than pandas. GitHub link: https://github.com/apache/texera/discussions/4033 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
