[ 
https://issues.apache.org/jira/browse/KNOX-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673639#comment-16673639
 ] 

Kevin Risden commented on KNOX-1524:
------------------------------------

Here are the results from Knox master commit: 
488e4445cf7e37c0c645b65c77c3a95c06500cfe run with 
[https://github.com/risdenk/knox-performance-tests/tree/knox-nightly]

Knox is performing basically the same as HS2 HTTP mode. With fetchSize=10000, 
then binary and http mode are pretty close. There are still some improvements 
to be made to HiveServer2, but Knox performance is much better now after 
KNOX-1530.
{code:java}
Select 200000 rows from a ~1GB file (1000000 1000 character width rows)

HDFS -text
2.43user 0.39system 0:02.10elapsed 134%CPU
2.55user 0.32system 0:02.03elapsed 140%CPU
2.39user 0.35system 0:02.01elapsed 136%CPU

Beeline binary default fetchSize=1000
5.36user 0.50system 0:05.54elapsed 105%CPU
5.29user 0.56system 0:05.33elapsed 109%CPU
5.22user 0.63system 0:05.54elapsed 105%CPU

Beeline http default fetchSize=1000
6.29user 0.74system 0:06.97elapsed 100%CPU
6.81user 0.64system 0:07.02elapsed 106%CPU
6.28user 0.53system 0:06.64elapsed 102%CPU

Beeline knox http default fetchSize=1000
6.61user 0.51system 0:07.90elapsed 90%CPU
6.50user 0.67system 0:07.74elapsed 92%CPU
6.50user 0.44system 0:07.49elapsed 92%CPU

Beeline binary fetchSize=10000
6.09user 0.61system 0:06.77elapsed 98%CPU
6.25user 0.52system 0:06.94elapsed 97%CPU
5.93user 0.72system 0:07.06elapsed 94%CPU

Beeline http fetchSize=10000
7.05user 0.72system 0:07.73elapsed 100%CPU
7.38user 0.60system 0:07.87elapsed 101%CPU
7.11user 0.56system 0:07.55elapsed 101%CPU

Beeline knox http fetchSize=10000
7.03user 0.62system 0:07.90elapsed 96%CPU
7.27user 0.53system 0:07.81elapsed 99%CPU
7.19user 0.55system 0:07.57elapsed 102%CPU{code}

> Hive "select *" performance evaluation
> --------------------------------------
>
>                 Key: KNOX-1524
>                 URL: https://issues.apache.org/jira/browse/KNOX-1524
>             Project: Apache Knox
>          Issue Type: Task
>            Reporter: Kevin Risden
>            Assignee: Kevin Risden
>            Priority: Major
>             Fix For: 1.2.0
>
>
> While looking at WebHDFS performance in KNOX-1221, I decided to look a bit 
> more into performance for common use cases. Hive performance is another area 
> that could use some research.
> Use "select * ... limit" to get a comparison of raw return speed from 
> HiveServer2. This should show how fast results can be streamed through 
> HiveServer2 and Knox. Compare the results to "hdfs dfs -text" since this will 
> render the data directly from HDFS. This should give comparisons for the 
> difference in overhead between HDFS, HiveServer2 binary, HiveServer2 HTTP, 
> and HiveServer2 HTTP with Knox.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to