Oh, I missed that there is default_hash_table_bucket_number which is calculated at init time to be 6 * number of nodes which is separate from hawq_rm_nvseg_perquery_perseg_limit for random and external tables. So it is already separate.
Any chance we can get the bucketnum from gp_distribution_policy displayed in psql when a user uses "\d" to define the table? Jon Roberts Principal Engineer | [email protected] | 615-426-8661 On Tue, Dec 6, 2016 at 2:53 PM, Yi Jin <[email protected]> wrote: > Hi Jon, > > I think to me it is a good news that we can increase > hawq_rm_nvseg_perquery_perseg_limit to improve performance when accessing > randomly distributed table. I think this limit is just a upper limit for > random table. In my opinion, it is not active when considering a hash > distributed table, and it is not considered when deciding the bucket number > of a hash table. > > So, even setting 24 as high as you mentioned, I think hash table always > follows its bucket number to acquire virtual segments. I think Hubert ( > [email protected]) can provide you more information how to decide bucket > number of a hash distributed table and how to decide number of virtual > segments for a query accessing mixed distributed tables. > > I want to mention another case that we have tight resource or busy > workload, query for random distributed table will not get stable number of > virtual segments as a hash table, even when hawq_rm_nvseg_perquery_perseg_ > limit > is set as high as 24. > > Best, > Yi > > On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <[email protected]> wrote: > > > I've been testing TPC-DS queries and found that I can get Randomly > > Distributed tables to outperform Hash Distributed tables by increasing > > hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high as > 24. > > > > For Hash Distributed tables, 24 is way too high. It is also not a great > > idea to make the default so high in case users are creating a mix of > Random > > and Hash Distributed Tables. > > > > Would it be possible to make this one GUC separated into two so that you > > can leave it 6 for Hash Distributed tables but another value like 16 for > > Randomly Distributed tables? > > > > This enhancement would also make it possible for later improvements in > the > > optimizer to determine how many vsegs to use. For example, some queries > > worked best set to 12 while others greatly benefited when set to 24. > > > > > > Jon Roberts > > >
