[
https://issues.apache.org/jira/browse/IMPALA-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914506#comment-16914506
]
Sahil Takiar commented on IMPALA-8888:
--------------------------------------
After talking with Tim offline, it seems that using a JDBC driver might be
better than impala-shell (impala-shell is slow enough that server side perf
improvements to this code probably don't affect latency). So will benchmark
with JDBC instead.
> Profile fetch performance when result spooling is enabled
> ---------------------------------------------------------
>
> Key: IMPALA-8888
> URL: https://issues.apache.org/jira/browse/IMPALA-8888
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
>
> Profile the performance of fetching rows when result spooling is enabled.
> There are a few queries that can be used to benchmark the performance:
> {{time ./bin/impala-shell.sh -B -q "select l_orderkey from
> tpch_parquet.lineitem" > /dev/null}}
> {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" >
> /dev/null}}
> The first fetches one column and 6,001,215 the second fetches 9 columns and
> 1,500,000 - so a mix of rows fetched vs. columns fetched.
> The base line for the benchmark should be the commit prior to IMPALA-8780.
> The benchmark should check for both latency and CPU usage (to see if the copy
> into {{BufferedTupleStream}} has a significant overhead).
> Various fetch sizes should be used in the benchmark as well to see if
> increasing the fetch size for result spooling improves performance (ideally
> it should) (it would be nice to run some fetches between machines as well as
> that will better reflect network round trip latencies).
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]