[jira] [Commented] (PHOENIX-6821) Batching with auto-commit connections

Hari Krishna Dara (Jira) Tue, 06 Dec 2022 03:47:07 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643844#comment-17643844
 ]


Hari Krishna Dara commented on PHOENIX-6821:
--------------------------------------------

While trying to understand the current implementation, I noticed one thing odd 
and I am proposing that we fix it.  The issue is that JDBC batch API seems to 
be only meant for DML or DDL statements (i.e, _not_ meant for DQL). While it 
doesn’t say as clearly, here how is how I came to the conclusion:
 * The 
[javadoc|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#addBatch-java.lang.String-]
 for {{addBatch}} says {_}“typically this is a SQL INSERT or UPDATE 
statement”{_}, This implies “typically this is not a SELECT statement” which is 
slightly vague and leaves some scope for interpretation. However,
 * if you look at the {{executeBatch}} API, you would realize that there is no 
provision to return a {{{}ResultSet{}}}. The return value is an {{int[]}} to 
indicate the update counts from each of the batches, so it seems DQL was not 
considered.
 * I also tried a quick experiment. I took a sqlite JDBC batch API sample and 
inserted a batch for SELECT statement and got the error 
{{{}java.sql.BatchUpdateException: batch entry 2: query returns results{}}}, 
which means the driver actively detects if any statement is returning a RS and 
flags it as an error.
 * I then repeated the same experiment on MySQL and got the exception 
{{java.sql.BatchUpdateException: Statement.executeUpdate() or 
Statement.executeLargeUpdate() cannot issue statements that produce result 
sets.}}

 
I am not against supporting this in phoenix sort of like an “extension”, but I 
have a few concerns: # As I mentioned above, the batch API itself doesn’t 
provide a way to access resultsets, but this feature may still be usable via 
{{getResultSets}} on statement. However, this includes resultsets for any DMLs 
executed ahead of the batch using the same statement object that are still 
open, so it can be quite unwieldy and can even be misleading in some situations.
 # Even if we are willing to live with the above limitations, since the JDBC 
doc itself doesn’t talk about this ability (in fact I can’t find any reference 
to this such as blogs and forum posts via Google), even if we continue to 
support this, it is unlikely that anyone would actually make use of this.

Considering all the above context, I think if we should stop supporting DQL and 
flag their usage as error like Sqlite and MySQL drivers do.
 

> Batching with auto-commit connections
> -------------------------------------
>
>                 Key: PHOENIX-6821
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6821
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Hari Krishna Dara
>            Priority: Major
>
> Phoenix commits the commands of a batch individually when executeBatch() is 
> called if auto commit is enabled on the connection.  For example, if a batch 
> of 100 upsert statements is created using addBatch() within an auto-commit 
> mode connection then when executeBatch() is called, Phoenix creates 100 HBase 
> batches each with a single mutation, i.e., one for each upsert. This defeats 
> the purpose of batching. The correct behavior is to commit the entire batch 
> of upsert statements using the minimum number of HBase batches. This means if 
> the entire batch of upsert statements fits in a single HBase batch, then one 
> HBase batch should be used.
> Please note for connections without auto-commit, Phoenix behaves correctly, 
> that is, the entire batch of upsert commands is committed using the minimum 
> number of HBase batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-6821) Batching with auto-commit connections

Reply via email to