[
https://issues.apache.org/jira/browse/KNOX-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673639#comment-16673639
]
Kevin Risden commented on KNOX-1524:
------------------------------------
Here are the results from Knox master commit:
488e4445cf7e37c0c645b65c77c3a95c06500cfe run with
[https://github.com/risdenk/knox-performance-tests/tree/knox-nightly]
Knox is performing basically the same as HS2 HTTP mode. With fetchSize=10000,
then binary and http mode are pretty close. There are still some improvements
to be made to HiveServer2, but Knox performance is much better now after
KNOX-1530.
{code:java}
Select 200000 rows from a ~1GB file (1000000 1000 character width rows)
HDFS -text
2.43user 0.39system 0:02.10elapsed 134%CPU
2.55user 0.32system 0:02.03elapsed 140%CPU
2.39user 0.35system 0:02.01elapsed 136%CPU
Beeline binary default fetchSize=1000
5.36user 0.50system 0:05.54elapsed 105%CPU
5.29user 0.56system 0:05.33elapsed 109%CPU
5.22user 0.63system 0:05.54elapsed 105%CPU
Beeline http default fetchSize=1000
6.29user 0.74system 0:06.97elapsed 100%CPU
6.81user 0.64system 0:07.02elapsed 106%CPU
6.28user 0.53system 0:06.64elapsed 102%CPU
Beeline knox http default fetchSize=1000
6.61user 0.51system 0:07.90elapsed 90%CPU
6.50user 0.67system 0:07.74elapsed 92%CPU
6.50user 0.44system 0:07.49elapsed 92%CPU
Beeline binary fetchSize=10000
6.09user 0.61system 0:06.77elapsed 98%CPU
6.25user 0.52system 0:06.94elapsed 97%CPU
5.93user 0.72system 0:07.06elapsed 94%CPU
Beeline http fetchSize=10000
7.05user 0.72system 0:07.73elapsed 100%CPU
7.38user 0.60system 0:07.87elapsed 101%CPU
7.11user 0.56system 0:07.55elapsed 101%CPU
Beeline knox http fetchSize=10000
7.03user 0.62system 0:07.90elapsed 96%CPU
7.27user 0.53system 0:07.81elapsed 99%CPU
7.19user 0.55system 0:07.57elapsed 102%CPU{code}
> Hive "select *" performance evaluation
> --------------------------------------
>
> Key: KNOX-1524
> URL: https://issues.apache.org/jira/browse/KNOX-1524
> Project: Apache Knox
> Issue Type: Task
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Major
> Fix For: 1.2.0
>
>
> While looking at WebHDFS performance in KNOX-1221, I decided to look a bit
> more into performance for common use cases. Hive performance is another area
> that could use some research.
> Use "select * ... limit" to get a comparison of raw return speed from
> HiveServer2. This should show how fast results can be streamed through
> HiveServer2 and Knox. Compare the results to "hdfs dfs -text" since this will
> render the data directly from HDFS. This should give comparisons for the
> difference in overhead between HDFS, HiveServer2 binary, HiveServer2 HTTP,
> and HiveServer2 HTTP with Knox.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)