Hi Kavinder, If you are looking for some guy "to be blamed" for making the decision of invoking the planner twice on master, unfortunately, that guy would be me, :).
First of all, I have left the company (Pivotal Inc.) about haft a year ago and was no longer working on the Apache HAWQ project full time anymore. And I have little time to keep tracking the code changes these days. Thus, what I am going to say is probably inconsistent with the latest code. * Problem* To improve the scalability and throughput of the system, we want to implement the dynamic resource allocation mechanism for HAWQ 2.0. In other words, we want to allocate exactly needed resource (the number of virtual segments), rather than the fixed amount of resource (like HAWQ 1.x), to run the underlying query. In order to achieve that, we need to calculate the cost of the query before allocating the resource, which means we need to figure out the execution plan first. However, in the framework of the current planner (either the old planner or the new optimizer ORCA), the optimal execution plan is generated given the to-be-run query and the number of segments. Thus, this is a chicken and egg problem. *Solution* IMO, the ideal solution for the above problem is to use an iterative algorithm: given a default number of segments, calculate the optimal execution plan; based on the output optimal execution plan, figure out the appropriate number of segments needed to run this query; and calculate the optimal execution plan again, and again, until the result is stable. *Implementation* In the actual implementation, we set the number of iterations to 2 for two major reasons: (1) two iterations are enough to give out a good result; (2) there is some cost associated with invoking the planner, especially the new optimizer ORCA. After implementing the first version, we later found that determining the number of virtual segments based on the cost of the query sometimes gave out very bad results (although this is the issue of the planner, because the cost of the planner provided doesn't imply the actual running cost of the query correctly). So, borrowing the idea from Hadoop MapReduce, we calculate the cost based on the total size of all tables needed to be scanned for the underlying query. It seemed we don't need to invoke the planner before allocating resource anymore. However, in our current resource manager, the allocated resource is segment-based, not process-based. For example, if an execution plan consists of three slices, meaning we need to setup three processes on each segment to run this query. One allocated resource unit (virtual segment) is for all three processes. In order to avoid the case where too many processes are started on one physical host, we need to know how many processes (the number of slices of the execution plan) are going to start on one virtual segment when we require resource from the resource manager. Thus, the execution plan is still needed. We could write a new function to calculate number of slices of the plan rather than invoking the planner, but after some investigation, we found the the new function did almost the same thing as the planner. So, why bother writing more duplicated code? *Engineering Consideration* IMO, for the long term, maybe the best solution is to embed the logic of resource negotiation into the planner. In that case, the output of the planner consists of the needed number of virtual segments and the associated optimal execution plan. The planner can be invoked just once on master. However, back to that time, we decided to separate the functionalities of resource negation and planner completely. Although it may looks a little ugly from the architecture view, it saved us a lot of code refactoring effort and the communication cost among different teams. We did have a release deadline, :). Above is just my 2 cents. Best, Lirong Lirong Jian HashData Inc. 2016-08-30 1:42 GMT+08:00 Goden Yao <[email protected]>: > Some back ground info: > > HAWQ 2.0 right now doesn't do dynamic resource allocation for PXF queries > (External Table). > It was a compromise we made as PXF used to have its own allocation logic > and we didn't get a chance to converge the logic with HAWQ 2.0. > So to make it compatible (on performance) with 1.x HAWQ, the current logic > will assume external table queries need 8 segments per node to execute. > (e.g. if 3 nodes in the cluster, it'll need 24 segments). > If that allocation fails, the query will fail and user will see the error > message like "do not have sufficient resources" or "segments" to execute > the query. > > As I understand, the 1st call is to get fragment info, 2nd call is to > optimize allocation for fragments to segments based on the info got from > 1st call and generate the optimized plan. > > -Goden > > On Mon, Aug 29, 2016 at 10:31 AM Kavinder Dhaliwal <[email protected]> > wrote: > > > Hi, > > > > Recently I was looking into the issue of PXF receiving multiple REST > > requests to the fragmenter. Based on offline discussions I have got a > rough > > idea that this is happening because HAWQ plans every query twice on the > > master. I understand that this is to allow resource negotiation that was > a > > feature of HAWQ 2.0. I'd like to know if anyone on the mailing list can > > give any more background into the history of the decision making behind > > this change for HAWQ 2.0 and whether this is only a short term solution > > > > Thanks, > > Kavinder > > >
