ColdL opened a new issue, #7011: URL: https://github.com/apache/paimon/issues/7011
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Motivation [PIP-40](https://cwiki.apache.org/confluence/display/PAIMON/PIP-40%3A+Introduce+a+new+Vector+data+type) ### Solution As discussed in [PIP-40](https://cwiki.apache.org/confluence/display/PAIMON/PIP-40%3A+Introduce+a+new+Vector+data+type), we propose introducing a dedicated vector data type in Paimon to better support storage and retrieval of vector data for AI workloads. PIP-40 can be roughly split into two parts: (1) introducing the vector data type itself; (2) allowing users to specify the file format for vector data, to further optimize storage/access efficiency in mixed workloads. For Part (1), a basic implementation is already available and includes: - Introducing a new vector type. To avoid confusion with the existing term "Vector" in the codebase, the new type is named VecType.java. - Providing a ColumnVector implementation for the vector type, with support in the paimon-arrow module, so Arrow-related file formats (e.g., Lance) can map FixedSizeList to VecType. - Adding Flink-side compatibility: via configuration, a Flink Array can be stored as Paimon VecType. For Part (2) (specifying the file format for vector data), work is still in progress. Although the code is still in a draft state, it changes some basic interfaces (e.g., DataGetters), thus I'd like to discuss it early. Any comments on this @JingsongLi ### Anything else? _No response_ ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
