[
https://issues.apache.org/jira/browse/PHOENIX-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643844#comment-17643844
]
Hari Krishna Dara commented on PHOENIX-6821:
--------------------------------------------
While trying to understand the current implementation, I noticed one thing odd
and I am proposing that we fix it. The issue is that JDBC batch API seems to
be only meant for DML or DDL statements (i.e, _not_ meant for DQL). While it
doesn’t say as clearly, here how is how I came to the conclusion:
* The
[javadoc|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#addBatch-java.lang.String-]
for {{addBatch}} says {_}“typically this is a SQL INSERT or UPDATE
statement”{_}, This implies “typically this is not a SELECT statement” which is
slightly vague and leaves some scope for interpretation. However,
* if you look at the {{executeBatch}} API, you would realize that there is no
provision to return a {{{}ResultSet{}}}. The return value is an {{int[]}} to
indicate the update counts from each of the batches, so it seems DQL was not
considered.
* I also tried a quick experiment. I took a sqlite JDBC batch API sample and
inserted a batch for SELECT statement and got the error
{{{}java.sql.BatchUpdateException: batch entry 2: query returns results{}}},
which means the driver actively detects if any statement is returning a RS and
flags it as an error.
* I then repeated the same experiment on MySQL and got the exception
{{java.sql.BatchUpdateException: Statement.executeUpdate() or
Statement.executeLargeUpdate() cannot issue statements that produce result
sets.}}
I am not against supporting this in phoenix sort of like an “extension”, but I
have a few concerns: # As I mentioned above, the batch API itself doesn’t
provide a way to access resultsets, but this feature may still be usable via
{{getResultSets}} on statement. However, this includes resultsets for any DMLs
executed ahead of the batch using the same statement object that are still
open, so it can be quite unwieldy and can even be misleading in some situations.
# Even if we are willing to live with the above limitations, since the JDBC
doc itself doesn’t talk about this ability (in fact I can’t find any reference
to this such as blogs and forum posts via Google), even if we continue to
support this, it is unlikely that anyone would actually make use of this.
Considering all the above context, I think if we should stop supporting DQL and
flag their usage as error like Sqlite and MySQL drivers do.
> Batching with auto-commit connections
> -------------------------------------
>
> Key: PHOENIX-6821
> URL: https://issues.apache.org/jira/browse/PHOENIX-6821
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Assignee: Hari Krishna Dara
> Priority: Major
>
> Phoenix commits the commands of a batch individually when executeBatch() is
> called if auto commit is enabled on the connection. For example, if a batch
> of 100 upsert statements is created using addBatch() within an auto-commit
> mode connection then when executeBatch() is called, Phoenix creates 100 HBase
> batches each with a single mutation, i.e., one for each upsert. This defeats
> the purpose of batching. The correct behavior is to commit the entire batch
> of upsert statements using the minimum number of HBase batches. This means if
> the entire batch of upsert statements fits in a single HBase batch, then one
> HBase batch should be used.
> Please note for connections without auto-commit, Phoenix behaves correctly,
> that is, the entire batch of upsert commands is committed using the minimum
> number of HBase batches.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)