[
https://issues.apache.org/jira/browse/HIVE-27283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-27283:
--------------------------------
Description:
Today, a query like this:
{code}
INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney
rubble', 32, 2.32);
{code}
spins up a TezAM and containers. I believe this is not optimal, even if we
already have an tez application running. Not to mention setups where only a
hiveserver2 is alive and TezAMs + LLAP executors are spun up on demand.
With this optimization a possible risk is to overwhelm Hiveserver2 with such
queries, this scenario should be handled with care.
My proposal is to maintain a local tez session pool (default size 0,
recommended is 1...4) in hs2, and let's identify "trivial queries" compile-time
that currently needs tez application (like the INSERT INTO above).
The first implementation can include only simply INSERT INTO queries, and we
can decide the rest later.
was:
Today, a query like this:
{code}
INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney
rubble', 32, 2.32);
{code}
spins up a TezAM and containers. I believe this is not optimal, even if we
already have an tez application running. Not to mention setups where only a
hiveserver2 is alive and TezAMs + LLAP executors are spun up on demand.
A possible risk is to overwhelm Hiveserver2 with such queries, this scenario
should be handled with care.
My proposal is to maintain a local tez session pool (default size 0,
recommended is 1...4) in hs2, and in compile-time let's identify "trivial
queries" that by default needs tez application (like the INSERT INTO above).
The first implementation can include only simply INSERT INTO queries, and we
can decide the rest later.
> Use tez.local.mode in HiveServer2 for trivial queries
> -----------------------------------------------------
>
> Key: HIVE-27283
> URL: https://issues.apache.org/jira/browse/HIVE-27283
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Priority: Major
>
> Today, a query like this:
> {code}
> INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney
> rubble', 32, 2.32);
> {code}
> spins up a TezAM and containers. I believe this is not optimal, even if we
> already have an tez application running. Not to mention setups where only a
> hiveserver2 is alive and TezAMs + LLAP executors are spun up on demand.
> With this optimization a possible risk is to overwhelm Hiveserver2 with such
> queries, this scenario should be handled with care.
> My proposal is to maintain a local tez session pool (default size 0,
> recommended is 1...4) in hs2, and let's identify "trivial queries"
> compile-time that currently needs tez application (like the INSERT INTO
> above).
> The first implementation can include only simply INSERT INTO queries, and we
> can decide the rest later.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)