[ 
https://issues.apache.org/jira/browse/IMPALA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6679.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0
                   Impala 2.13.0

> Don't claim ideal reservation in scanner until actually processing scan ranges
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-6679
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6679
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> One cause of the memory regression in IMPALA-4835 was that scans, 
> particularly Parquet, claimed memory very aggressively in 
> HdfsScanNode::Open() based on the planner's estimated ideal memory. There 
> were two problems here:
> # In many cases the ideal memory increase was not at all needed, because the 
> total amount of data scanned in the Parquet file was actually less than the 
> minimum reservation. We don't know this for sure until the read the Parquet 
> footer and check the column sizes.
> # HdfsScanNode::Open() may happen a long time before the first 
> HdfsScanNode::GetNext(), particularly if the scan node is on the left side of 
> a broadcast join. The scan node may grab a lot of reservation before other 
> plan nodes have started running, eventually resulting in starving a lot of 
> nodes.
> It would make sense to wait until we are actually processing a scan range to 
> increase a scanner thread's reservation from the minimum. There are various 
> more complex things we could do here, but I suspect the simplest possible 
> thing would help a lot. E.g. we could extend this by  releasing reservation 
> after processing each scan range.
> I believe the second effect may be fairly significant when running many 
> queries with high concurrency. In that case the async join build will tend to 
> be disabled, which means that HdfsScanNode::GetNext() on the left child of a 
> broadcast join will only be called after the join build has completed, so 
> overlap between that scans and the nodes on the right side of the join would 
> be minimal (there's not enough synchronisation to guarantee no overlap).
> See 
> https://docs.google.com/document/d/1kR0zfevNNUJom3sH1XmposacVZ-QALan7NSwnR5CkSA/edit#
>  for some more analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to