rahil-c commented on code in PR #18146:
URL: https://github.com/apache/hudi/pull/18146#discussion_r2848519697


##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaType.java:
##########
@@ -119,6 +119,8 @@ public enum HoodieSchemaType {
 
   VARIANT(Schema.Type.RECORD),
 
+  VECTOR(Schema.Type.FIXED),

Review Comment:
    @vinothchandar when i discussed with Tim we actually do not want to have 
this as RECORD with additional fields: 
https://github.com/apache/hudi/pull/18146#discussion_r2793856934
    I would assume there would be more overhead with having this fields 
approach, and im not sure what future extensibility we would capture by having 
additional fields that isnt captured by the current model.
    
   In my mind the only flexibility the dense VECTOR type would need is the  
`storageBacking` field. Not sure what else evolution would be needed as we have 
all the other required info such as dimension and element type, and those 
likely do not change once a user defines this column.
   
   In regards to sparse vectors, based on RFC 99 
https://github.com/apache/hudi/pull/18184/changes we likely would not be using 
this `VECTOR` type as its meant for DENSE vector cases, and we would focus on 
defining a `SPARSE_VECTOR` type that will have a different backing and 
expectations since we are keeping track of indices for non zero positions:
   <img width="1277" height="190" alt="Screenshot 2026-02-24 at 9 15 42 AM" 
src="https://github.com/user-attachments/assets/988997dd-c981-463a-9893-87272ddffa65";
 />



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to