wombatu-kun commented on PR #18497: URL: https://github.com/apache/hudi/pull/18497#issuecomment-4242014596
Minor code-quality nits - `HoodieSparkLanceWriter#enrichSparkSchemaForLanceVectors`: the local `DataType dt = field.dataType()` is assigned before the `ArrayType` check, then unused once the cast is done. Can collapse to `if (!(field.dataType() instanceof ArrayType))`. - `VectorConversionUtils#buildVectorColumnsMetadataValue` duplicates the format produced by `HoodieSchema#buildVectorColumnsMetadataValue` for Avro schemas. Consider adding a Javadoc cross-reference and asserting the two produce identical strings for equivalent schemas (could even delegate). - `VectorConversionUtils#restoreVectorMetadataFromArrowSchema` walks only top-level fields. If a nested struct ever carries a VECTOR child (unlikely today but possible), it would be missed. Worth a Javadoc note: "Top-level VECTORs only; nested struct children are not recursed into." - `HoodieBaseLanceWriter#close`: the new `additionalSchemaMetadata` hook is called after bloom filter metadata — fine — but the nested `if (writer != null)` check is redundant (already inside the outer `writer != null` for bloom filter); can collapse. - `buildVectorColumnsMetadataValue` returns `""` for schemas without vectors, and the override early-returns `Collections.emptyMap()` in that case — so no footer entry is emitted. Correct, just worth a comment explaining that the hook is called unconditionally but is a no-op when there are no vectors (otherwise a future reader maintaining this code might wonder why a non-VECTOR Lance file has no `hoodie.vector.columns`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
