Re: hawq_rm_nvseg_perquery_perseg_limit

Jon Roberts Tue, 06 Dec 2016 13:21:51 -0800

Oh, I missed that there is default_hash_table_bucket_number which is
calculated at init time to be 6 * number of nodes which is separate from
hawq_rm_nvseg_perquery_perseg_limit for random and external tables.  So it
is already separate.


Any chance we can get the bucketnum from gp_distribution_policy displayed
in psql when a user uses "\d" to define the table?



Jon Roberts
Principal Engineer | [email protected] | 615-426-8661

On Tue, Dec 6, 2016 at 2:53 PM, Yi Jin <[email protected]> wrote:

> Hi Jon,
>
> I think to me it is a good news that we can increase
> hawq_rm_nvseg_perquery_perseg_limit to improve performance when accessing
> randomly distributed table. I think this limit is just a upper limit for
> random table. In my opinion, it is not active when considering a hash
> distributed table, and it is not considered when deciding the bucket number
> of a hash table.
>
> So, even setting 24 as high as you mentioned, I think hash table always
> follows its bucket number to acquire virtual segments. I think Hubert (
> [email protected]) can provide you more information how to decide bucket
> number of a hash distributed table and how to decide number of virtual
> segments for a query accessing mixed distributed tables.
>
> I want to mention another case that we have tight resource or busy
> workload, query for random distributed table will not get stable number of
> virtual segments as a hash table, even when hawq_rm_nvseg_perquery_perseg_
> limit
> is set as high as 24.
>
> Best,
> Yi
>
> On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <[email protected]> wrote:
>
> > I've been testing TPC-DS queries and found that I can get Randomly
> > Distributed tables to outperform Hash Distributed tables by increasing
> > hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high as
> 24.
> >
> > For Hash Distributed tables, 24 is way too high.  It is also not a great
> > idea to make the default so high in case users are creating a mix of
> Random
> > and Hash Distributed Tables.
> >
> > Would it be possible to make this one GUC separated into two so that you
> > can leave it 6 for Hash Distributed tables but another value like 16 for
> > Randomly Distributed tables?
> >
> > This enhancement would also make it possible for later improvements in
> the
> > optimizer to determine how many vsegs to use.  For example, some queries
> > worked best set to 12 while others greatly benefited when set to 24.
> >
> >
> > Jon Roberts
> >
>

Re: hawq_rm_nvseg_perquery_perseg_limit

Reply via email to