LaughingVzr opened a new pull request, #5252: URL: https://github.com/apache/hive/pull/5252
### What changes were proposed in this pull request? modify LazyStruct#findIndexes function and LazyStruct#parseMultiDelimit function, change fields.length Conditional judgment: ```java public void parseMultiDelimit(byte[] rawRow, byte[] fieldDelimit) { - if (fields.length > 1 && delimitIndexes[i - 1] != -1) { + if (delimitIndexes[i - 1] != -1) { } private int[] findIndexes(byte[] array, byte[] target) { - if (fields.length <= 1) { + if (fields.length < 1) { ... - for (int i = 1; i < indexes.length; i++) { + for (int i = 1; i <= indexes.length; i++) { ... } return indexes; } ``` I add an test for this fix: ```java @Test public void testParseMultiDelimit() throws Throwable { try { // single column named id List<String> columns = new ArrayList<>(); columns.add("id"); // column type is string List<TypeInfo> columnTypes = new ArrayList<>(); PrimitiveTypeInfo primitiveTypeInfo = new PrimitiveTypeInfo(); primitiveTypeInfo.setTypeName("string"); columnTypes.add(primitiveTypeInfo); // separators + escapeChar => "|" byte[] separators = new byte[]{124, 2, 3, 4, 5, 6, 7, 8}; // sequence =>"\N" Text sequence = new Text(); sequence.set(new byte[]{92, 78}); // create a lazy struct inspector ObjectInspector objectInspector = LazyFactory.createLazyStructInspector(columns, columnTypes, separators, sequence, false, false, (byte) '0'); LazyStruct lazyStruct = (LazyStruct) LazyFactory.createLazyObject(objectInspector); // origin row data String rowData = "1|@|"; // row field delimiter String fieldDelimiter = "|@|"; // parse row use multi delimit lazyStruct.parseMultiDelimit(rowData.getBytes(StandardCharsets.UTF_8), fieldDelimiter.getBytes(StandardCharsets.UTF_8)); // check the first field and second field start position index // before fix result: 0,1 // after fix result: 0,2 Assert.assertArrayEquals(new int[]{0, 2}, lazyStruct.startPosition); } catch (Throwable e) { e.printStackTrace(); throw e; } } ``` ### Why are the changes needed? If a table only have one column field with multidelimit,query this column data is error data. When I use this data to do other operation(e.g cast use UDFToLong function),get result is NULL. ### Does this PR introduce _any_ user-facing change? No ### Is the change a dependency upgrade? No ### How was this patch tested? test class: serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyStruct.java test function: org.apache.hadoop.hive.serde2.lazy.TestLazyStruct#testParseMultiDelimit test command: mvn test -Dtest=TestLazyStruct --pl serde -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org