+1 for detail explanation. One more, normally we do not suggest that default_hash_table_bucket_number is greater than hawq_rm_nvseg_perquery_limit(512). When initing large cluster, the default_hash_table_bucket_number will be adjusted accordingly. If default_hash_table_bucket_number > hawq_rm_nvseg_perquery_limit, it will be adjusted to ( hawq_rm_nvseg_perquery_limit / hostnumber ) * hostnumber. If the cluster is expanded, it should also need to be set properly.
Jiali On Wed, Jul 13, 2016 at 1:40 PM, Yi Jin <[email protected]> wrote: > Hi Vineet, > > Some my comment. > > For question 1. > Yes, > perquery_limit is introduced mainly for restrict resource usage in large > scale cluster; perquery_perseg_limit is to avoid allocating too many > processes in one segment, which may cause serious performance issue. So, > two gucs are for different performance aspects. Along with the variation of > cluster scale, one of the two limits actually takes effect. We dont have to > let both active for resource allocation. > > For question 2. > > In fact, perquery_perseg_limit is a general resource restriction for all > queries not only hash table queries and external table queries, this is why > this guc is not merged with another one. For example, when we run some > queries upon random distributed tables, it does not make sense to let > resource manager refer a guc for hash table. > > For the last topic item. > > In my opinion, it is not necessary to adjust hawq_rm_nvseg_perquery_limit, > say, we just need to leave it unchanged and actually not active until we > really want to run a large-scale HAWQ cluster, for example, 100+ nodes. > > Best, > Yi > > On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel <[email protected]> wrote: > > > Hi all, > > > > I’m trying to document some GUC usage in detail and have questions on > > hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit > > tuning. > > > > *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call it > > *perquery_limit* in short. > > *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s call it > > *perquery_perseg_limit* in short. > > > > > > 1) Is there ever any benefit in having perquery_limit *greater than* > > (perquery_perseg_limit * segment host count) ? > > For example in a 10-node cluster, HAWQ will never allocate more than (GUC > > default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512 doesn’t > > have any effect. It seems perquery_limit overrides (takes effect) > > perquery_perseg_limit only when it’s value is less than > > (perquery_perseg_limit * segment host count). > > > > Is that the correct assumption? That would make sense, as users may want > to > > keep a check on how much processing a single query can take up (that > > implies that the limit must be lower than the total possible v-segs). Or, > > it may make sense in large clusters (100-nodes or more) where we need to > > limit the pressure on HDFS. > > > > > > 2) Now, if the purpose of hawq_rm_nvseg_perquery_limit is to keep a check > > on single query resource usage (by limiting the # of v-segs), doesn’t if > > affect default_hash_table_bucket_number because queries will fail when > > *default_hash_table_bucket_number* is greater than > > hawq_rm_nvseg_perquery_limit ? In that case, the purpose of > > hawq_rm_nvseg_perquery_limit conflicts with the ability to run queries on > > HASH dist tables. This then means that tuning > hawq_rm_nvseg_perquery_limit > > down is not a good idea, which seems conflicting to the purpose of the > GUC > > (in relation to other GUC). > > > > > > Perhaps someone can provide some examples on *how and when would you > > tune hawq_rm_nvseg_perquery_limit* in this 10-node example: > > > > *Defaults on a 10-node cluster are:* > > a) *hawq_rm_nvseg_perquery_perseg_limit* = 6 (hence ability to spin up 6 > * > > 10 = 60 total v-segs for random tables) > > b) *hawq_rm_nvseg_perquery_limit* = 512 (but HAWQ will never dispatch > more > > than 60 v-segs on random table, so value of 512 does not seem practical) > > c) *default_hash_table_bucket_number* = 60 (6 * 10) > > > > > > > > Thanks > > Vineet > > >
