[
https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620713#comment-17620713
]
Qihong Jiang edited comment on IMPALA-11677 at 10/20/22 5:04 AM:
-----------------------------------------------------------------
Hello !, [~csringhofer] I'm only using non-transactional tables right now and
it's equally slow. I tried using the Bulk API last week, but the improvement
was very small. Then I referenced the code in impala3 and modified it to be an
asynchronous call. The execution speed is greatly improved, but I don't know if
there is any risk.
{code:java}
public static List<Long> fireInsertEvents(MetaStoreClient msClient,
TableInsertEventInfo insertEventInfo, String dbName, String tableName) {
if (!insertEventInfo.isTransactional()) {
LOG.info("fire the insert events asynchronously.");
ExecutorService fireInsertEventThread =
Executors.newSingleThreadExecutor();
CompletableFuture.runAsync(() -> {
try {
fireInsertEventHelper(msClient.getHiveClient(),
insertEventInfo.getInsertEventReqData(),
insertEventInfo.getInsertEventPartVals(), dbName,
tableName);
} catch(Exception e) {
LOG.error("failed to async call fireInsertEventHelper");
} finally {
msClient.close();
LOG.info("fire the insert events asynchronously end.");
}
}, fireInsertEventThread)
.thenRun(() -> fireInsertEventThread.shutdown());
} else {
Stopwatch sw = Stopwatch.createStarted();
try {
fireInsertTransactionalEventHelper(msClient.getHiveClient(),
insertEventInfo, dbName, tableName);
} catch (Exception e) {
LOG.error("Failed to fire insert event. Some tables might not be"
+ " refreshed on other impala clusters.", e);
} finally {
LOG.info("Time taken to fire insert events on table {}.{}: {} msec",
dbName,
tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS));
msClient.close();
}
} return Collections.emptyList();
}{code}
I am not an expert in impala. I hope to get your guidance. Thank you!
was (Author: JIRAUSER289149):
Hello !, [~csringhofer] I'm only using non-transactional tables right now and
it's equally slow. I tried using the Bulk API last week, but the improvement
was very small. Then I referenced the code in impala3 and modified it to be an
asynchronous call. The execution speed is greatly improved, but I don't know if
there is any risk.
{code:java}
public static List<Long> fireInsertEvents(MetaStoreClient msClient,
TableInsertEventInfo insertEventInfo, String dbName, String tableName) {
if (!insertEventInfo.isTransactional()) {
LOG.info("fire the insert events asynchronously.");
ExecutorService fireInsertEventThread =
Executors.newSingleThreadExecutor();
CompletableFuture.runAsync(() -> {
try {
fireInsertEventHelper(msClient.getHiveClient(),
insertEventInfo.getInsertEventReqData(),
insertEventInfo.getInsertEventPartVals(), dbName,
tableName);
} catch(Exception e) {
LOG.error("failed to async call fireInsertEventHelper");
} }, fireInsertEventThread)
.thenRun(() -> {
LOG.info("fire the insert events asynchronously end.");
msClient.close();
fireInsertEventThread.shutdown();
});
} else {
Stopwatch sw = Stopwatch.createStarted();
try {
fireInsertTransactionalEventHelper(msClient.getHiveClient(),
insertEventInfo, dbName, tableName);
} catch (Exception e) {
LOG.error("Failed to fire insert event. Some tables might not be"
+ " refreshed on other impala clusters.", e);
} finally {
LOG.info("Time taken to fire insert events on table {}.{}: {} msec",
dbName,
tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS));
msClient.close();
}
} return Collections.emptyList();
}{code}
I am not an expert in impala. I hope to get your guidance. Thank you!
> FireInsertEvents function can be very slow for tables with large number of
> partitions.
> --------------------------------------------------------------------------------------
>
> Key: IMPALA-11677
> URL: https://issues.apache.org/jira/browse/IMPALA-11677
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 4.1.0
> Reporter: Qihong Jiang
> Assignee: Qihong Jiang
> Priority: Major
>
> In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java.
> fireInsertEvents function can be very slow for tables with large number of
> partitions. So we should use asynchronous calls.Just like in impala-3.x
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]