hu6360567 opened a new issue #11589:
URL: https://github.com/apache/arrow/issues/11589
I'm using `arrow-jdbc` to convert query result from JDBC to arrow.
But the following code, unexpected behaivor happens.
Assuming a sqlite db, the 2nd row of col_2 and col_3 are null.
| col_1 | col_2 | col_3 |
|-------|--------|--------|
| 1 | abc | 3.14 |
| 2 | NULL | NULL |
```java
public void querySql(String query, QueryOption option) throws Exception {
try (final java.sql.Connection conn =
connectContainer.getConnection();
final Statement stmt = conn.createStatement();
final ResultSet rs = stmt.executeQuery(query)
) {
// create config without reuse schema root and custom batch size
from option
final JdbcToArrowConfig config = new
JdbcToArrowConfigBuilder().setAllocator(new
RootAllocator()).setCalendar(JdbcToArrowUtils.getUtcCalendar())
.setTargetBatchSize(option.getBatchSize()).setReuseVectorSchemaRoot(true).build();
final ArrowVectorIterator iterator =
JdbcToArrow.sqlToArrowVectorIterator(rs, config);
while (iterator.hasNext()) {
// retrieve result from iterator
final VectorSchemaRoot root = iterator.next();
option.getCallback().handleBatchResult(root);
root.allocateNew(); // it has to be allocate new
}
} catch (java.lang.Exception e) {
throw new Exception(e.getMessage());
}
}
......
// batch_size is set to 1, then callback is called twice.
QueryOptions options = new QueryOption(1,
root -> {
// if printer is not set, get schema, write header
if (printer == null) {
final String[] headers =
root.getSchema().getFields().stream().map(Field::getName).toArray(String[]::new);
printer = new CSVPrinter(writer,
CSVFormat.Builder.create(CSVFormat.DEFAULT).setHeader(headers).build());
}
final int rows = root.getRowCount();
final List<FieldVector> fieldVectors = root.getFieldVectors();
// iterate over rows
for (int i = 0; i < rows; i++) {
final int rowId = i;
final List<String> row = fieldVectors.stream().map(v ->
v.getObject(rowId)).map(String::valueOf).collect(Collectors.toList());
printer.printRecord(row);
}
});
connection.querySql("SELECT * FROM test_db", options);
......
```
if `root.allocateNew()` is called, the csv file is expected,
```
column_1,column_2,column_3
1,abc,3.14
2,null,null
```
Otherwise, null values of 2nd row are remaining the same values of 1st row
```
column_1,column_2,column_3
1,abc,3.14
2,abc,3.14
```
**Question: Should I call `allocateNew` every time? When should I close the
ValueVector/VectorSchemaRoot?**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]