Hi everyone, Dylanhz and I would like to start a discussion on FLIP-590: Introduce Multimodal Data Types: Vector, Tensor, and Image [1].
This FLIP follows the direction of FLIP-577 and proposes first-class multimodal data types for AI-oriented Flink pipelines. Today, values such as embeddings, tensors, and decoded images are commonly represented as VARBINARY, STRING, ARRAY, or custom ROW structures. These encodings can work, but they lose important semantics such as element dtype, tensor shape, vector dimension, image mode, and decoded image layout. This makes it hard for SQL/Table, DataStream, Java UDFs, PyFlink UDFs, and connectors to share a stable contract. The FLIP proposes three new logical data types: - TENSOR: a dense n-dimensional tensor with element dtype and optional fixed shape. - VECTOR: a dense fixed-dimension one-dimensional vector for embeddings, feature vectors, and vector database integration. - IMAGE: a decoded static image value with mode, height, width, and HWC pixel data. It also introduces ElementDType as a shared element dtype enum for tensor, vector, and image payloads. This allows multimodal dtypes such as uint8, uint32, uint64, float32, and float64 without adding top-level unsigned SQL scalar types to Flink. The proposal is intentionally scoped to type semantics, runtime representation, Java/PyFlink APIs, serialization, and connector/format boundaries. It does not aim to turn Flink into a tensor computation or image processing framework. Looking forward to your feedback. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-590%3A+Introduce+Multimodal+Data+Type%3A++Vector%2C+Tensor%2C+and+Image <https://cwiki.apache.org/confluence/display/FLINK/FLIP-590:+Introduce+Multimodal+Data+Type:++Vector,+Tensor,+and+Image> Best, Biao Geng
