[ 
https://issues.apache.org/jira/browse/IMPALA-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938718#comment-16938718
 ] 

Sahil Takiar commented on IMPALA-8888:
--------------------------------------

After running lots of profiling experiments, I have concluded that for a small 
# of rows returned, result spooling adds 0 to negligible performance overhead. 
For large tables scans (e.g. a full table scan of catalog_sales), result 
spooling *improves* performance by up to 70% in my perf experiments.

> Profile fetch performance when result spooling is enabled
> ---------------------------------------------------------
>
>                 Key: IMPALA-8888
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8888
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> Profile the performance of fetching rows when result spooling is enabled. 
> There are a few queries that can be used to benchmark the performance:
> {{time ./bin/impala-shell.sh -B -q "select l_orderkey from 
> tpch_parquet.lineitem" > /dev/null}}
> {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > 
> /dev/null}}
> The first fetches one column and 6,001,215 the second fetches 9 columns and 
> 1,500,000 - so a mix of rows fetched vs. columns fetched.
> The base line for the benchmark should be the commit prior to IMPALA-8780.
> The benchmark should check for both latency and CPU usage (to see if the copy 
> into {{BufferedTupleStream}} has a significant overhead).
> Various fetch sizes should be used in the benchmark as well to see if 
> increasing the fetch size for result spooling improves performance (ideally 
> it should) (it would be nice to run some fetches between machines as well as 
> that will better reflect network round trip latencies).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to