Hi everyone,

Dylanhz and I would like to start a discussion on FLIP-590: Introduce 
Multimodal Data Types: Vector, Tensor, and Image [1].

This FLIP follows the direction of FLIP-577 and proposes first-class multimodal 
data types for AI-oriented Flink pipelines.

Today, values such as embeddings, tensors, and decoded images are commonly 
represented as VARBINARY, STRING, ARRAY, or custom ROW structures. These 
encodings can work, but they lose important semantics such as element dtype, 
tensor shape, vector dimension, image mode, and decoded image layout. This 
makes it hard for SQL/Table, DataStream, Java UDFs, PyFlink UDFs, and 
connectors to share a stable contract.

The FLIP proposes three new logical data types:

- TENSOR: a dense n-dimensional tensor with element dtype and optional fixed 
shape.
- VECTOR: a dense fixed-dimension one-dimensional vector for embeddings, 
feature vectors, and vector database integration.
- IMAGE: a decoded static image value with mode, height, width, and HWC pixel 
data.

It also introduces ElementDType as a shared element dtype enum for tensor, 
vector, and image payloads. This allows multimodal dtypes such as uint8, 
uint32, uint64, float32, and float64 without adding top-level unsigned SQL 
scalar types to Flink.

The proposal is intentionally scoped to type semantics, runtime representation, 
Java/PyFlink APIs, serialization, and connector/format boundaries. It does not 
aim to turn Flink into a tensor computation or image processing framework.


Looking forward to your feedback.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-590%3A+Introduce+Multimodal+Data+Type%3A++Vector%2C+Tensor%2C+and+Image
 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-590:+Introduce+Multimodal+Data+Type:++Vector,+Tensor,+and+Image>


Best,
Biao Geng

Reply via email to