CodiumAI-Agent commented on PR #9757: URL: https://github.com/apache/incubator-gluten/pull/9757#issuecomment-2910971268
## PR Reviewer Guide π Here are some key observations to aid the review process: <table> <tr><td> **π« Ticket compliance analysis β ** **[9752](https://github.com/apache/incubator-gluten/issues/9752) - PR Code Verified** Compliant requirements: - Support ArrayType conversion in LogicalTypeConverter. - Implement ListVectorAccessor for array vectors. - Implement ArrayVectorWriter. - Add unit tests covering array types in ScanTest. Requires further human verification: - Handling of null or empty arrays at runtime. **[9754](https://github.com/apache/incubator-gluten/issues/9754) - PR Code Verified** Compliant requirements: - Support MapType conversion in LogicalTypeConverter. - Implement MapVectorAccessor. - Implement MapVectorWriter. - Add unit tests covering map types in ScanTest. Requires further human verification: - Handling of null or empty maps at runtime. </td></tr> <tr><td>β±οΈ <strong>Estimated effort to review</strong>: 4 π΅π΅π΅π΅βͺ</td></tr> <tr><td>π§ͺ <strong>PR contains tests</strong></td></tr> <tr><td>π <strong>No security concerns identified</strong></td></tr> <tr><td>β‘ <strong>Recommended focus areas for review</strong><br><br> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/9757/files#diff-bcb790d33c3348eea0bd7acd49e53189c7b581defa2967157de02e10cd176263R158-R207'><strong>Null handling</strong></a> ListVectorAccessor and MapVectorAccessor lack null or empty checks for vectors, which may lead to errors when encountering null entries. </summary> ```java class ListVectorAccessor extends ArrowVectorAccessor { private ListVector vector; private ArrowVectorAccessor elementAccessor; public ListVectorAccessor(FieldVector vector) { this.vector = (ListVector) vector; FieldVector elementVector = this.vector.getDataVector(); this.elementAccessor = ArrowVectorAccessor.create(elementVector); } @Override public Object get(int rowIndex) { int startIndex = vector.getElementStartIndex(rowIndex); int endIndex = vector.getElementEndIndex(rowIndex); Object[] elements = new Object[endIndex - startIndex]; for (int i = startIndex; i < endIndex; i++) { elements[i - startIndex] = elementAccessor.get(i); } return new GenericArrayData(elements); } } // In Arrow, the internal implementation of a map vector is an array vector. class MapVectorAccessor extends ArrowVectorAccessor { private final MapVector vector; private StructVector entriesVector; private ArrowVectorAccessor keyAccessor; private ArrowVectorAccessor valueAccessor; public MapVectorAccessor(FieldVector vector) { this.vector = (MapVector) vector; this.entriesVector = (StructVector) this.vector.getDataVector(); FieldVector keyVector = this.entriesVector.getChild(MapVector.KEY_NAME); FieldVector valueVector = this.entriesVector.getChild(MapVector.VALUE_NAME); this.keyAccessor = ArrowVectorAccessor.create(keyVector); this.valueAccessor = ArrowVectorAccessor.create(valueVector); } @Override public Object get(int rowIndex) { int startIndex = vector.getElementStartIndex(rowIndex); int endIndex = vector.getElementEndIndex(rowIndex); Map<Object, Object> mapEntries = new HashMap<>(); for (int i = startIndex; i < endIndex; i++) { Object key = keyAccessor.get(i); Object value = valueAccessor.get(i); mapEntries.put(key, value); } return new GenericMapData(mapEntries); } } ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/9757/files#diff-bcb790d33c3348eea0bd7acd49e53189c7b581defa2967157de02e10cd176263R181-R207'><strong>Map ordering</strong></a> MapVectorAccessor uses HashMap, resulting in non-deterministic key ordering in output. Consider using LinkedHashMap for predictable ordering. </summary> ```java class MapVectorAccessor extends ArrowVectorAccessor { private final MapVector vector; private StructVector entriesVector; private ArrowVectorAccessor keyAccessor; private ArrowVectorAccessor valueAccessor; public MapVectorAccessor(FieldVector vector) { this.vector = (MapVector) vector; this.entriesVector = (StructVector) this.vector.getDataVector(); FieldVector keyVector = this.entriesVector.getChild(MapVector.KEY_NAME); FieldVector valueVector = this.entriesVector.getChild(MapVector.VALUE_NAME); this.keyAccessor = ArrowVectorAccessor.create(keyVector); this.valueAccessor = ArrowVectorAccessor.create(valueVector); } @Override public Object get(int rowIndex) { int startIndex = vector.getElementStartIndex(rowIndex); int endIndex = vector.getElementEndIndex(rowIndex); Map<Object, Object> mapEntries = new HashMap<>(); for (int i = startIndex; i < endIndex; i++) { Object key = keyAccessor.get(i); Object value = valueAccessor.get(i); mapEntries.put(key, value); } return new GenericMapData(mapEntries); } } ``` </details> <details><summary><a href='https://github.com/apache/incubator-gluten/pull/9757/files#diff-afa8cbf7bf66b8228476179fcf77ecd306a21e95903b7cb3c8c6e6139835b714R47-R82'><strong>Type matching</strong></a> Converters map uses exact class key matching, which wonβt match subclasses or parameterized logical types. Consider instanceof checks or a more flexible registry. </summary> ```java private static Map<Class<?>, VLTypeConverter> converters = Map.ofEntries( Map.entry(BooleanType.class, logicalType -> new io.github.zhztheplayer.velox4j.type.BooleanType()), Map.entry(IntType.class, logicalType -> new io.github.zhztheplayer.velox4j.type.IntegerType()), Map.entry(BigIntType.class, logicalType -> new io.github.zhztheplayer.velox4j.type.BigIntType()), Map.entry(DoubleType.class, logicalType -> new io.github.zhztheplayer.velox4j.type.DoubleType()), Map.entry(VarCharType.class, logicalType -> new io.github.zhztheplayer.velox4j.type.VarCharType()), // TODO: may need precision Map.entry(TimestampType.class, logicalType -> new io.github.zhztheplayer.velox4j.type.TimestampType()), Map.entry(DecimalType.class, logicalType -> { DecimalType decimalType = (DecimalType) logicalType; return new io.github.zhztheplayer.velox4j.type.DecimalType( decimalType.getPrecision(), decimalType.getScale()); }), Map.entry(DayTimeIntervalType.class, logicalType -> new io.github.zhztheplayer.velox4j.type.BigIntType()), Map.entry(RowType.class, logicalType -> { RowType flinkRowType = (RowType) logicalType; List<Type> fieldTypes = flinkRowType.getChildren().stream(). map(LogicalTypeConverter::toVLType). collect(Collectors.toList()); return new io.github.zhztheplayer.velox4j.type.RowType( flinkRowType.getFieldNames(), fieldTypes); }), Map.entry(ArrayType.class, logicalType -> { ArrayType arrayType = (ArrayType) logicalType; Type elementType = toVLType(arrayType.getElementType()); return io.github.zhztheplayer.velox4j.type.ArrayType.create(elementType); }), Map.entry(MapType.class, logicalType -> { MapType mapType = (MapType) logicalType; Type keyType = toVLType(mapType.getKeyType()); Type valueType = toVLType(mapType.getValueType()); return io.github.zhztheplayer.velox4j.type.MapType.create(keyType, valueType); }) ); ``` </details> </td></tr> </table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
