cshuo commented on issue #18506:
URL: https://github.com/apache/hudi/issues/18506#issuecomment-4405337541

   Thanks for opening this. I think we can split this into a few pieces.
   
   1. First, for the Flink user API, Flink currently does not have a native 
`VECTOR` type, and we probably cannot mirror Spark SQL exactly by adding a new 
`VECTOR` type in DDL, since the Flink SQL parser is not as flexible for type 
extensions. One feasible approach is to declare the column as an array in Flink 
SQL and specify the Hudi vector columns through table options, for example:
   ```
       CREATE TABLE t (
         id BIGINT,
         embedding1 ARRAY<FLOAT> NOT NULL,
         embedding2 ARRAY<DOUBLE> NOT NULL,
         PRIMARY KEY (id) NOT ENFORCED
       ) WITH (
         'connector' = 'hudi',
         'path' = '...',
         'hoodie.vector.columns' = 'embedding1:VECTOR(128), 
embedding2:VECTOR(64)'
       );
   ```
   This keeps the Flink schema valid while giving Hudi enough metadata to treat 
`embedding` as a `VECTOR(128)` column.
   
   2. Then, the Flink read/write path should support Hudi `VECTOR` columns. For 
example, if Spark creates a Hudi table with a vector column, Flink should be 
able to read and write that table correctly. On the Flink side, the 
physical/user-visible type can still be `ARRAY<FLOAT>` / `ARRAY<DOUBLE>` / 
`ARRAY<TINYINT>`, while Hudi preserves the vector descriptor, dimension, and 
storage semantics internally.
   
   3. Flink `VECTOR_SEARCH` support can be handled as a follow-up. Flink’s 
`VECTOR_SEARCH` works on `FLOAT ARRAY` / `DOUBLE ARRAY` columns directly, so it 
does not fundamentally depend on Hudi’s `VECTOR` logical type. Once Hudi vector 
columns round-trip as Flink arrays, users should already be able to apply 
Flink’s vector search API on top of them; any deeper integration or 
optimization can be tracked separately.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to