[
https://issues.apache.org/jira/browse/ARROW-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534011#comment-17534011
]
Todd Farmer commented on ARROW-16035:
-------------------------------------
[~jswenson] : Thanks for your review and comments! Here's my thinking:
I understand the comment about ResultSet.isLast() being optional only for
TYPE_FORWARD_ONLY ResultSets. I see the same note about optional support in
the [documentation for
isAfterLast()|https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/ResultSet.html#isAfterLast()]:
{code:java}
boolean isAfterLast() throws SQLException
Retrieves whether the cursor is after the last row in this ResultSet object.
Note:Support for the isAfterLast method is optional for ResultSets with a
result set type of TYPE_FORWARD_ONLY {code}
While it's possible that a JDBC driver offers support for one of the above
methods, but not the other, I considered them effectively equivalent for our
purposes. Both are optional in the same context, one is potentially expensive,
the other ambiguous in the case of empty results. In the event of a JDBC
driver electing to not support these methods, there is no mechanism prior to
the first time the underlying ResultSet is read to assess whether an empty
ResultSet has been supplied - we literally have to call ResultSet.next() to see
what gets returned. That doesn't happen during initialization of the
ArrowVectorIterator object - it only happens when ArrowVectorIterator.next() is
called. That effectively means that there is no way to support
ArrowVectorIterator.hasNext() without first calling ArrowVectorIterator.next()
- at least in the context where the JDBC driver vendor has elected to not
implement the optional methods.
We could elect to have ArrowVectorIterator.hasNext() return true until
ArrowVectorIterator.next() has been called, and rely entirely on the added
"readComplete" variable to indicate when the ResultSet has been fully read or
not. That would result in the following strange behavior, though:
{code:java}
ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(emptyResultSet,
config);
boolean willBeTrue = iter.hasNext(); // always true until next() is called
VectorSchemaRoot root = iter.next(); // reads ResultSet for first time,
discovers it's empty
int willBeZero = root.getRowCount(); // no rows!
boolean willBeFalse = iter.hasNext(); // now is false after reading
ResultSet{code}
Effectively, this would make ArrowVectorIterator _always_ return at least one
VectorSchemaRoot, even when the underlying ResultSet is empty. I'm not
entirely clear whether this is acceptable.
While not really related to your overall point, I'm having difficulty
identifying a context in which ResultSet.isLast() would be expensive, while
ResultSet.isAfterLast() would not be.
> [Java] Arrow to JDBC ArrowVectorIterator with does not terminate with empty
> result set
> --------------------------------------------------------------------------------------
>
> Key: ARROW-16035
> URL: https://issues.apache.org/jira/browse/ARROW-16035
> Project: Apache Arrow
> Issue Type: Bug
> Components: Java
> Affects Versions: 7.0.0
> Reporter: Jonathan Swenson
> Assignee: Todd Farmer
> Priority: Major
> Labels: pull-request-available
> Fix For: 8.0.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Using an ArrowVectorIterator built from a JDBC Result Set that is empty
> causes the iterator to never terminate.
> {code:java}
> ArrowVectorIterator iterator =
> JdbcToArrow.sqlToArrowVectorIterator(conn.createStatement()
> .executeQuery("select 1 from table1 where false"), config); {code}
>
> It appears as though this is due to the implementation of the
> [hasNext()|https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L158]
> method.
> The expectation is that the `isAfterLast()` method on a JDBC result set
> return true when the result set is empty. However, according to the [JDBC
> documentation|https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/ResultSet.html#isAfterLast()]
> it will always return false when the result set is empty.
> {quote}Returns:{{{}true{}}} if the cursor is after the last row; {{false}} if
> the cursor is at any other position or the result set contains no rows
> {quote}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)