[ 
https://issues.apache.org/jira/browse/HIVE-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9010:
-----------------------------------
    Description: 
Recently, many major perf features have been added (or are being added) to 
Hive, such as vectorization, CBO, Tez, Spark, etc.
These are off by default, and customers using the Apache distribution may not 
be aware of them, and may not take advantage of all the speed Hive can offer. 

We can create a Hive perf configuration utility that will be able to set 6-10 
important, easy-to-set settings. It can be used by admins or users when 
deploying Hive or on an existing cluster. Ideally all the no-brainer 
set-to-true settings would be there, with caveats, if any, described; some 
other ones may be, too, but we don't want to add any options for tuning because 
the whole point is to make it not confusing (as compared to editing the entire 
config file). Unless we have automatic tuning at some point, the users doing 
perf tuning can edit the config file manually (or use set) after reading the 
docs.

Then we can mention it prominently in the docs and release notes. This should 
go a long way towards making sure users can utilize Hive to its full potential, 
without us enabling large/perf features by default, at least until they are 
stable (e.g. CBO can be enabled by default, so this tool may note that).

Experimental feature settings (true/false or simple) can also be added in a 
separate section.

  was:
Recently, many major perf features have been added (or are being added) to 
Hive, such as vectorization, CBO, Tez, Spark, etc.
These are off by default, and customers using the Apache distribution may not 
be aware of them, and may not take advantage of all the speed Hive can offer. 

We can create a Hive perf configuration utility that will be able to set 6-10 
important, easy-to-set settings. It can be used by admins or users when 
deploying Hive or on an existing cluster. Ideally all the no-brainer 
set-to-true settings would be there, with caveats, if any, described; some 
other ones may be, too, but we don't want to add any options for tuning because 
the whole point is to make it not confusing (as compared to editing the entire 
config file). Unless we have automatic tuning at some point, the users doing 
perf tuning can edit the config file manually after reading the docs.

Then we can mention it prominently in the docs and release notes. This should 
go a long way towards making sure users can utilize Hive to its full potential, 
without us enabling large/perf features by default, at least until they are 
stable (e.g. CBO can be enabled by default, so this tool may note that).

Experimental feature settings (true/false or simple) can also be added in a 
separate section.


> introduce Hive perf configuration utility
> -----------------------------------------
>
>                 Key: HIVE-9010
>                 URL: https://issues.apache.org/jira/browse/HIVE-9010
>             Project: Hive
>          Issue Type: Improvement
>          Components: Configuration
>            Reporter: Sergey Shelukhin
>
> Recently, many major perf features have been added (or are being added) to 
> Hive, such as vectorization, CBO, Tez, Spark, etc.
> These are off by default, and customers using the Apache distribution may not 
> be aware of them, and may not take advantage of all the speed Hive can offer. 
> We can create a Hive perf configuration utility that will be able to set 6-10 
> important, easy-to-set settings. It can be used by admins or users when 
> deploying Hive or on an existing cluster. Ideally all the no-brainer 
> set-to-true settings would be there, with caveats, if any, described; some 
> other ones may be, too, but we don't want to add any options for tuning 
> because the whole point is to make it not confusing (as compared to editing 
> the entire config file). Unless we have automatic tuning at some point, the 
> users doing perf tuning can edit the config file manually (or use set) after 
> reading the docs.
> Then we can mention it prominently in the docs and release notes. This should 
> go a long way towards making sure users can utilize Hive to its full 
> potential, without us enabling large/perf features by default, at least until 
> they are stable (e.g. CBO can be enabled by default, so this tool may note 
> that).
> Experimental feature settings (true/false or simple) can also be added in a 
> separate section.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to