[
https://issues.apache.org/jira/browse/KNOX-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673546#comment-16673546
]
Kevin Risden commented on KNOX-1524:
------------------------------------
Some exciting news on the Hive performance front:
* Hadoop 3.1.1
* Hive 3.1.1
* Knox 1.1.0
Below shows that HTTP vs Binary mode are on par even through Knox with the
default fetchSize bumped up to 10000 due to GZip compression. There are some
improvements in Knox master from KNOX-1530 that should bring the remaining time
down. I’ll be testing that next.
{code:java}
Select 200000 rows from a ~1GB file (1000000 1000 character width rows)
HDFS -text
2.87user 0.39system 0:02.62elapsed 124%CPU
2.75user 0.46system 0:02.27elapsed 141%CPU
2.70user 0.63system 0:02.36elapsed 141%CPU
Beeline binary default fetchSize=1000
5.48user 0.47system 0:07.56elapsed 78%CPU
5.28user 0.45system 0:05.56elapsed 102%CPU
6.46user 0.63system 0:06.35elapsed 111%CPU
Beeline http default fetchSize=1000
6.85user 0.56system 0:09.26elapsed 79%CPU
6.86user 0.43system 0:07.08elapsed 102%CPU
6.84user 0.49system 0:07.11elapsed 103%CPU
Beeline knox http default fetchSize=1000
7.43user 0.93system 0:10.87elapsed 76%CPU
8.45user 0.74system 0:09.92elapsed 92%CPU
8.83user 0.81system 0:09.43elapsed 102%CPU
Beeline binary fetchSize=10000
6.64user 0.65system 0:07.42elapsed 98%CPU
6.39user 0.79system 0:07.20elapsed 99%CPU
6.34user 0.76system 0:07.46elapsed 95%CPU
Beeline http fetchSize=10000
7.58user 0.51system 0:07.91elapsed 102%CPU
7.39user 0.62system 0:07.83elapsed 102%CPU
7.61user 0.61system 0:07.97elapsed 103%CPU
Beeline knox http fetchSize=10000
8.05user 0.62system 0:08.51elapsed 101%CPU
8.10user 0.69system 0:08.53elapsed 102%CPU
7.85user 0.73system 0:08.22elapsed 104%CPU
{code}
> Hive "select *" performance evaluation
> --------------------------------------
>
> Key: KNOX-1524
> URL: https://issues.apache.org/jira/browse/KNOX-1524
> Project: Apache Knox
> Issue Type: Task
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Major
> Fix For: 1.2.0
>
>
> While looking at WebHDFS performance in KNOX-1221, I decided to look a bit
> more into performance for common use cases. Hive performance is another area
> that could use some research.
> Use "select * ... limit" to get a comparison of raw return speed from
> HiveServer2. This should show how fast results can be streamed through
> HiveServer2 and Knox. Compare the results to "hdfs dfs -text" since this will
> render the data directly from HDFS. This should give comparisons for the
> difference in overhead between HDFS, HiveServer2 binary, HiveServer2 HTTP,
> and HiveServer2 HTTP with Knox.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)