[
https://issues.apache.org/jira/browse/KNOX-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650250#comment-16650250
]
Kevin Risden commented on KNOX-1524:
------------------------------------
h2. Test Case and Reproduction
The following results are tested with:
* a single 4 core 8GB RAM Centos 7 VM on my Macbook Pro laptop
* openjdk version "1.8.0_181"
* Hadoop 3.1.1 single node pseudo distributed
* Hive 3.1.0 with single HiveServer2 node
**
{code:java}
/opt/apache-hive-3.1.0-bin/bin/hiveserver2 --hiveconf
hive.server2.transport.mode=http --hiveconf hive.server2.enable.doAs=false
--hiveconf fs.hdfs.impl.disable.cache=true --hiveconf
fs.file.impl.disable.cache=true{code}
** Enabling or disabling the filesystem cache did not change the results
* Knox 1.1.0 without SSL
* data set - [http://stat-computing.org/dataexpo/2009/the-data.html] -
1990.csv - 486MB
* "select *" from table with single column
* Limit to first 1 million rows
Create table
{code:java}
CREATE TABLE tbl (a string) STORED AS TEXTFILE LOCATION '/tmp/1990';{code}
Testing commands
* HDFS native
**
{code:java}
time hdfs dfs -text /tmp/1990/1990.csv | head -n 1000000 > /dev/null{code}
* Hive binary
**
{code:java}
time /opt/apache-hive-3.1.0-bin/bin/beeline -u
'jdbc:hive2://hive.vagrant:10000/' -n admin -p admin-password -e 'select * from
tbl limit 1000000' > /dev/null{code}
* Hive HTTP
**
{code:java}
time /opt/apache-hive-3.1.0-bin/bin/beeline -u
'jdbc:hive2://hive.vagrant:10001/;transportMode=http;httpPath=cliservice' -n
admin -p admin-password -e 'select * from tbl limit 1000000' > /dev/null{code}
* Hive Knox
**
{code:java}
time /opt/apache-hive-3.1.0-bin/bin/beeline -u
'jdbc:hive2://hive.vagrant:8443/;transportMode=http;httpPath=gateway/sandbox/hive'
-n admin -p admin-password -e 'select * from tbl limit 1000000' >
/dev/null{code}
Assumptions
* JVM startup time is approximately the same for each run
* Hive is using native Hadoop libraries (checked with ps aux | grep native)
> Hive "select *" performance evaluation
> --------------------------------------
>
> Key: KNOX-1524
> URL: https://issues.apache.org/jira/browse/KNOX-1524
> Project: Apache Knox
> Issue Type: Task
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Major
> Fix For: 1.2.0
>
>
> While looking at WebHDFS performance in KNOX-1221, I decided to look a bit
> more into performance for common use cases. Hive performance is another area
> that could use some research.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)