Jon, Sorry for late response, I find I missed this one. With regard to the memory quota of one vseg, I think I know your consideration now. It makes sense. What's your expectation for the memory limit? 32GB or 64GB? I think I can open a jira to track this and make decision which limit we can extend to.
And for hawq_rm_nvseg_perquery_limit, I think if you dont want this limits your query resource in a large cluster ,just set it to a large value 65535 for example, then I think you can achieve your goal, saying using hawq_rm_nvseg_perquery_perseg_limit to dynamically limit the number of vsegs for one query. Yi On Tue, Jan 24, 2017 at 3:21 AM, Jon Roberts <[email protected]> wrote: > It is common in Greenplum to use "explain analyze" to look at the query > plan and if the query spills to disk, then increasing the statement memory > will typically increase the speed of the query. With hash distributed > tables in Hawq, this will be true too and the only way, short of rewriting > the query, to make it perform better. > > For the really large Greenplum clusters out there, we don't see benefits > from running 8 segments per host. Instead, we dial this back to 4 segments > because the network traffic in the interconnect becomes the bottleneck. > For Hawq where the clusters will likely be much bigger, I think we will > have the same network limitations. So with random distribution, we will > probably reduce the number of vsegs to get better performance on the > network side and then use more memory per vseg to prevent spilling to disk > and use more of the available resources. That is why we need the ability > to allocate more than 16GB of RAM per vseg. > > On a different but related note, I would like to see > hawq_rm_nvseg_perquery_limit be changed to not determine the number of > vsegs per query across the entire cluster but at the datanode level. That > would make it much easier to expand and shrink a cluster without having to > adjust GUCs each time. > > > > Jon Roberts > Principal Engineer | [email protected] | 615-426-8661 > > On Sun, Jan 22, 2017 at 10:05 PM, Yi Jin <[email protected]> wrote: > > > Hi Jon, > > > > For scenario 1, it is true. However, do you think it necessary to make > one > > vseg having so high memory quota setting? Of course, I think if we force > > hawq_rm_stmt_nvseg a value different with bucket number, optimizer will > > give different plan. This is way to force high concurrency and high > > resource occupation. > > > > For scenario 2, I think if we want to release more concurrency in > executing > > one query, we should advice customer to > > increase hawq_rm_nvseg_perquery_limit (default 512) > > and hawq_rm_nvseg_perquery_perseg_limit(default 6), then we choose an > > appropriate vseg mem quota, as a result HAWQ will make decision to > leverage > > more concurrency and more resource for big queries automatically. I mean > > HAWQ will choose the number of virtual segments automatically, not always > > high number of virtual segments. > > > > For scenario 3, It is recommended not to assign all system memory to HAWQ > > in the segment side. It is a good idea to leave some for system and for > the > > other applications. > > > > I think it is fine to support higher memory limit for one virtual > segment, > > but still I have no idea why it is necessary. > > > > As HAWQ uses different memory allocation strategy comparing with > Greenplum, > > it is true, the external behavior of HAWQ is different. Virtual segment > > (VSEG) is a critical concept in HAWQ to manage resource consumption and > > query QE concurrency, in fact for Greenplum, the QE concurrency is fixed, > > i.e. the number of QEs for one query execution is fixed, this is why > > Greenplum user just needs to think of memory consumption in one segment, > as > > all segments consume the same amount of memory for the same query. While > > back to HAWQ, this number is dynamic, you can easily change resource > queue > > definition or statement resource definition to change it, you don't have > to > > install and config more segments in those nodes to force the QE > > concurrency. > > > > Consider a scenario that there are a lot of small scale queries > > concurrently arrived, HAWQ just allocate a small number of vsegs for each > > query without resource waste and this makes HAWQ able to support high > > concurrency much better than traditional MPP-architecture database to > > achieve high throughput. Opposite, for those large scale queries, HAWQ > can > > automatically increase number of vseg to a very high number if you set > > those limit gucs high as well. > > > > Best, > > Yi > > > > On Mon, Jan 23, 2017 at 12:30 PM, Jon Roberts <[email protected]> > wrote: > > > > > I've been thinking about these scenarios: > > > > > > 1. Hash distributed tables with fixed number of buckets. > > > If the tables were built using the defaults, buckets = 6 * number of > > > nodes. So you basically have 6 vsegs per host. Multiply that by 16GB > > and > > > you only can use 96GB of the 256GB of RAM per node. > > > > > > 2. A user has random tables but doesn't understand they can increase > the > > > number of vsegs. > > > This will be common for users that come from Greenplum. They again can > > > only set statement member to 16GB so they are stuck with a max of 96GB > of > > > RAM usage. > > > > > > 3. User increases vsegs and statement memory. > > > Possibly run out of memory if too aggressive with settings. > > > > > > - I think we should be able to specify statement memory higher than > 16GB. > > > Maybe the limit should be something much higher such as 1TB. > > > > > > - The optimizer should limit the number of vsegs based on statement > > memory > > > setting to prevent OOM. You could do the opposite too. (limit memory > > and > > > use the vseg setting provided) Greenplum can limit the amount of > memory > > > but we have two dials to adjust with vsegs and memory. > > > > > > > > > Jon Roberts > > > > > > On Sun, Jan 22, 2017 at 5:20 PM, Yi Jin <[email protected]> wrote: > > > > > > > Hi Jon, > > > > > > > > That guc setting limit means for one virtual segment, the maximum > > > > consumable memory is 16GB, for one segment/node, there maybe multiple > > > vsegs > > > > allocated to run queries, so if a node has 256gb expected to be > > consumed > > > by > > > > HAWQ, it will have at most 16 vsegs running concurrently. > > > > > > > > hawq_rm_stmt_vseg_memory is for setting statement level vseg memory > > > > consumption, it is required to specify hawq_rm_stmt_nvseg as well, > only > > > > when hawq_rm_stmt_nvseg is greater than 0, hawq_rm_stmt_vseg_memory > is > > > > activated regardless the original target resource queue vseg resource > > > quota > > > > definition. For example, you can set hawq_rm_stmt_vseg_memory as > 16gb, > > > > hawq_rm_stmt_nvseg > > > > as 256, if you have a cluster having 256gb * 16 nodes and your target > > > > resource queue can use 100% cluster resource, you will have 16 vsegs > > > > running per node to consume all memory resource for this query. > > > > > > > > Best, > > > > Yi > > > > > > > > On Sat, Jan 21, 2017 at 3:40 PM, Lei Chang <[email protected]> > > wrote: > > > > > > > > > hawq_rm_stmt_vseg_memory and hawq_rm_stmt_nvseg need to be used > > > together > > > > to > > > > > set the specific number of segments and the vseg memory. And > > > > > hawq_rm_stmt_nvseg should be less than > hawq_rm_nvseg_perquery_perseg_ > > > > > limit. > > > > > > > > > > set hawq_rm_stmt_vseg_memory = '2GB';set hawq_rm_stmt_nvseg = 6; > > > > > > > > > > looks 16GB is somewhat small for big dedicated machines: if 16GB is > > per > > > > > virtual segment memory, if 8 segment is used, it only use 128GB. > > > > > > > > > > Cheers > > > > > Lei > > > > > > > > > > > > > > > On Fri, Jan 20, 2017 at 9:11 PM, Jon Roberts <[email protected]> > > > > wrote: > > > > > > > > > > > Why is there a limit of 16GB for hawq_rm_stmt_vseg_memory? A > > cluster > > > > > with > > > > > > 256GB per node and dedicated for HAWQ may certainly want to > utilize > > > > more > > > > > > memory per segment. Is there something I'm missing regarding > > > statement > > > > > > memory? > > > > > > > > > > > > Secondly, does the number of vsegs for a query get influenced by > > the > > > > > > statement memory or does it just look at the plan and > > > > > > hawq_rm_nvseg_perquery_perseg_limit? > > > > > > > > > > > > > > > > > > Jon Roberts > > > > > > > > > > > > > > > > > > > > >
