I checked the code for describing table using commands \d and \d+. I think in current version we cannot get the bucket number through just \d command. Maybe we can get a chance to improve this in future. This should be easy to add because describe uses sql statements to query expected information.
On Wed, Dec 7, 2016 at 8:20 AM, Jon Roberts <[email protected]> wrote: > Oh, I missed that there is default_hash_table_bucket_number which is > calculated at init time to be 6 * number of nodes which is separate from > hawq_rm_nvseg_perquery_perseg_limit for random and external tables. So it > is already separate. > > Any chance we can get the bucketnum from gp_distribution_policy displayed > in psql when a user uses "\d" to define the table? > > > > Jon Roberts > Principal Engineer | [email protected] | 615-426-8661 > > On Tue, Dec 6, 2016 at 2:53 PM, Yi Jin <[email protected]> wrote: > > > Hi Jon, > > > > I think to me it is a good news that we can increase > > hawq_rm_nvseg_perquery_perseg_limit to improve performance when > accessing > > randomly distributed table. I think this limit is just a upper limit for > > random table. In my opinion, it is not active when considering a hash > > distributed table, and it is not considered when deciding the bucket > number > > of a hash table. > > > > So, even setting 24 as high as you mentioned, I think hash table always > > follows its bucket number to acquire virtual segments. I think Hubert ( > > [email protected]) can provide you more information how to decide bucket > > number of a hash distributed table and how to decide number of virtual > > segments for a query accessing mixed distributed tables. > > > > I want to mention another case that we have tight resource or busy > > workload, query for random distributed table will not get stable number > of > > virtual segments as a hash table, even when > hawq_rm_nvseg_perquery_perseg_ > > limit > > is set as high as 24. > > > > Best, > > Yi > > > > On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <[email protected]> wrote: > > > > > I've been testing TPC-DS queries and found that I can get Randomly > > > Distributed tables to outperform Hash Distributed tables by increasing > > > hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high as > > 24. > > > > > > For Hash Distributed tables, 24 is way too high. It is also not a > great > > > idea to make the default so high in case users are creating a mix of > > Random > > > and Hash Distributed Tables. > > > > > > Would it be possible to make this one GUC separated into two so that > you > > > can leave it 6 for Hash Distributed tables but another value like 16 > for > > > Randomly Distributed tables? > > > > > > This enhancement would also make it possible for later improvements in > > the > > > optimizer to determine how many vsegs to use. For example, some > queries > > > worked best set to 12 while others greatly benefited when set to 24. > > > > > > > > > Jon Roberts > > > > > >
