[
https://issues.apache.org/jira/browse/KNOX-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652353#comment-16652353
]
Kevin Risden commented on KNOX-1524:
------------------------------------
So some interesting new results with Hive 4.0.0-SNAPSHOT commit
d7be4b9f26345439c472969461d3d2c81f7e5057.
HIVE-20621 didn't seem to have an affect on performance (positive or negative).
HIVE-17194 looks like it caused a performance degradation at least for the test
case I was running.
2 million rows
* HDFS native - ~2.2 seconds
* Hive binary - ~11.0 seconds
* Hive HTTP - ~19.1 seconds
* Hive HTTP without HS2 compression - ~13.1 seconds
* Hive HTTP with Knox - ~24.8 seconds
* Hive HTTP with Knox without HS2 compression - ~19.3 seconds
I used "--hiveconf hive.server2.thrift.http.compression.enabled=false" to
disable compression for HiveServer2.
That brings the HiveServer 2 HTTP and binary modes closer in performance with
each other. Knox supports compression as well by default so curious if Knox
compression is causing the remaining bottleneck.
> Hive "select *" performance evaluation
> --------------------------------------
>
> Key: KNOX-1524
> URL: https://issues.apache.org/jira/browse/KNOX-1524
> Project: Apache Knox
> Issue Type: Task
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Major
> Fix For: 1.2.0
>
>
> While looking at WebHDFS performance in KNOX-1221, I decided to look a bit
> more into performance for common use cases. Hive performance is another area
> that could use some research.
> Use "select * ... limit" to get a comparison of raw return speed from
> HiveServer2. This should show how fast results can be streamed through
> HiveServer2 and Knox. Compare the results to "hdfs dfs -text" since this will
> render the data directly from HDFS. This should give comparisons for the
> difference in overhead between HDFS, HiveServer2 binary, HiveServer2 HTTP,
> and HiveServer2 HTTP with Knox.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)