Csaba Ringhofer created IMPALA-12647:
----------------------------------------
Summary: Add Hive compatible way to get modified row count in DMLs
(HS2)
Key: IMPALA-12647
URL: https://issues.apache.org/jira/browse/IMPALA-12647
Project: IMPALA
Issue Type: New Feature
Components: Clients
Reporter: Csaba Ringhofer
e,.g after
insert into t values (1);
print "modified 1 rows(s)"
Hive and Impala implemented this in incompatible ways using different HS2
"dialects":
- HIVE-14388 added support using TGetOperationStatusResp.numModifiedRows
- IMPALA-7290 added support using TCloseImpalaOperationResp,TDmlResult
https://github.com/apache/hive/blob/fd92b3926393f0366b87cd55d5a0ad27968f18db/service-rpc/if/TCLIService.thrift#L1120
https://github.com/apache/impala/blob/4114fe8db6ec80b2e1679e946555f91ab7043f2e/common/thrift/ImpalaService.thrift#L966
The Impala patch is newer (probably we didn't know about the Hive solution?),
on the other side it is based on a much older solution in Beeswax. The Impala
solution is also more "advanced" and contains extra information relevant in
Kudu upserts/inserts.
Currently impala-shell uses the Impala solution while in Hive compatible strict
HS2 mode it doesn't return modified row count.
impyla doesn't support modified row count:
https://github.com/cloudera/impyla/issues/302
There is an extension function that parses Kudu related row counts from the
profile:
https://github.com/cloudera/impyla/blob/76f0ba3221e1ff26037e36afbe4a5591168157ce/impala/hiveserver2.py#L205
Ideally there would be a solution supported by both components and clients
wouldn't need to adapt to specific dialects.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]