[jira] [Updated] (IMPALA-8656) Support for eagerly fetching and spooling all query result rows

Alex Rodoni (Jira) Tue, 24 Sep 2019 14:20:10 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Rodoni updated IMPALA-8656:
--------------------------------
    Description: 
Impala's current interaction with clients is pulled-based: it relies on clients 
to fetch results to trigger the generation of more result row batches until all 
the result rows have been produced. If a client issues a query without fetching 
all the results, the query fragments will continue to consume the resources 
until the query hits is cancelled and unregistered for whatever reasons. This 
is undesirable as resources are held up by misbehaving clients and other 
queries may wait for extended period of time in admission control due to this.

The high level idea for this JIRA is for Impala to have a mode in which result 
sets of queries are eagerly fetched and spooled somewhere (preferably some 
persistent storage). In this way, the cluster's resources are freed up once all 
result rows have been fetched and stored in the spooling location. Incoming 
client fetches can be returned from this spooled locations.

Query option: 
SPOOL_QUERY_RESULT
MAX_RESULT_SPOOLING_MEM
MAX_SPILLED_RESULT_SPOOLING_MEM
FETCH_ROWS_TIMEOUT_MS (https://issues.apache.org/jira/browse/IMPALA-7312)


cc'ing [~stakiar], [~twm378], [~joemcdonnell], [~lv]

  was:
Impala's current interaction with clients is pulled-based: it relies on clients 
to fetch results to trigger the generation of more result row batches until all 
the result rows have been produced. If a client issues a query without fetching 
all the results, the query fragments will continue to consume the resources 
until the query hits is cancelled and unregistered for whatever reasons. This 
is undesirable as resources are held up by misbehaving clients and other 
queries may wait for extended period of time in admission control due to this.

The high level idea for this JIRA is for Impala to have a mode in which result 
sets of queries are eagerly fetched and spooled somewhere (preferably some 
persistent storage). In this way, the cluster's resources are freed up once all 
result rows have been fetched and stored in the spooling location. Incoming 
client fetches can be returned from this spooled locations.

Query option: 
SPOOL_QUERY_RESULT
MAX_RESULT_SPOOLING_MEM
MAX_SPILLED_RESULT_SPOOLING_MEM
FETCH_ROWS_TIMEOUT_MS

cc'ing [~stakiar], [~twm378], [~joemcdonnell], [~lv]


> Support for eagerly fetching and spooling all query result rows
> ---------------------------------------------------------------
>
>                 Key: IMPALA-8656
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8656
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.12.0, Impala 3.2.0
>            Reporter: Michael Ho
>            Assignee: Sahil Takiar
>            Priority: Critical
>
> Impala's current interaction with clients is pulled-based: it relies on 
> clients to fetch results to trigger the generation of more result row batches 
> until all the result rows have been produced. If a client issues a query 
> without fetching all the results, the query fragments will continue to 
> consume the resources until the query hits is cancelled and unregistered for 
> whatever reasons. This is undesirable as resources are held up by misbehaving 
> clients and other queries may wait for extended period of time in admission 
> control due to this.
> The high level idea for this JIRA is for Impala to have a mode in which 
> result sets of queries are eagerly fetched and spooled somewhere (preferably 
> some persistent storage). In this way, the cluster's resources are freed up 
> once all result rows have been fetched and stored in the spooling location. 
> Incoming client fetches can be returned from this spooled locations.
> Query option: 
> SPOOL_QUERY_RESULT
> MAX_RESULT_SPOOLING_MEM
> MAX_SPILLED_RESULT_SPOOLING_MEM
> FETCH_ROWS_TIMEOUT_MS (https://issues.apache.org/jira/browse/IMPALA-7312)
> cc'ing [~stakiar], [~twm378], [~joemcdonnell], [~lv]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-8656) Support for eagerly fetching and spooling all query result rows

Reply via email to