Jon, Sorry for late response, I find I missed this one.

With regard to the memory quota of one vseg, I think I know your
consideration now. It makes sense. What's your expectation for the memory
limit? 32GB or 64GB? I think I can open a jira to track this and make
decision which limit we can extend to.

And for  hawq_rm_nvseg_perquery_limit, I think if you dont want this limits
your query resource in a large cluster ,just set it to a large value 65535
for example, then I think you can achieve your goal, saying using
hawq_rm_nvseg_perquery_perseg_limit to dynamically limit the number of
vsegs for one query.

Yi

On Tue, Jan 24, 2017 at 3:21 AM, Jon Roberts <[email protected]> wrote:

> It is common in Greenplum to use "explain analyze" to look at the query
> plan and if the query spills to disk, then increasing the statement memory
> will typically increase the speed of the query.  With hash distributed
> tables in Hawq, this will be true too and the only way, short of rewriting
> the query, to make it perform better.
>
> For the really large Greenplum clusters out there, we don't see benefits
> from running 8 segments per host.  Instead, we dial this back to 4 segments
> because the network traffic in the interconnect becomes the bottleneck.
> For Hawq where the clusters will likely be much bigger, I think we will
> have the same network limitations.  So with random distribution, we will
> probably reduce the number of vsegs to get better performance on the
> network side and then use more memory per vseg to prevent spilling to disk
> and use more of the available resources.  That is why we need the ability
> to allocate more than 16GB of RAM per vseg.
>
> On a different but related note, I would like to see
> hawq_rm_nvseg_perquery_limit be changed to not determine the number of
> vsegs per query across the entire cluster but at the datanode level.  That
> would make it much easier to expand and shrink a cluster without having to
> adjust GUCs each time.
>
>
>
> Jon Roberts
> Principal Engineer | [email protected] | 615-426-8661
>
> On Sun, Jan 22, 2017 at 10:05 PM, Yi Jin <[email protected]> wrote:
>
> > Hi Jon,
> >
> > For scenario 1, it is true. However, do you think it necessary to make
> one
> > vseg having so high memory quota setting? Of course, I think if we force
> > hawq_rm_stmt_nvseg a value different with bucket number, optimizer will
> > give different plan. This is way to force high concurrency and high
> > resource occupation.
> >
> > For scenario 2, I think if we want to release more concurrency in
> executing
> > one query, we should advice customer to
> > increase hawq_rm_nvseg_perquery_limit (default 512)
> > and hawq_rm_nvseg_perquery_perseg_limit(default 6), then we choose an
> > appropriate vseg mem quota, as a result HAWQ will make decision to
> leverage
> > more concurrency and more resource for big queries automatically.  I mean
> > HAWQ will choose the number of virtual segments automatically, not always
> > high number of virtual segments.
> >
> > For scenario 3, It is recommended not to assign all system memory to HAWQ
> > in the segment side. It is a good idea to leave some for system and for
> the
> > other applications.
> >
> > I think it is fine to support higher memory limit for one virtual
> segment,
> > but still I have no idea why it is necessary.
> >
> > As HAWQ uses different memory allocation strategy comparing with
> Greenplum,
> > it is true, the external behavior of HAWQ is different. Virtual segment
> > (VSEG) is a critical concept in HAWQ to manage resource consumption and
> > query QE concurrency, in fact for Greenplum, the QE concurrency is fixed,
> > i.e. the number of QEs for one query execution is fixed, this is why
> > Greenplum user just needs to think of memory consumption in one segment,
> as
> > all segments consume the same amount of memory for the same query. While
> > back to HAWQ, this number is dynamic, you can easily change resource
> queue
> > definition or statement resource definition to change it, you don't have
> to
> > install and config more segments in those nodes to force the QE
> > concurrency.
> >
> > Consider a scenario that there are a lot of small scale queries
> > concurrently arrived, HAWQ just allocate a small number of vsegs for each
> > query without resource waste and this makes HAWQ able to support high
> > concurrency much better than traditional MPP-architecture database to
> > achieve high throughput. Opposite, for those large scale queries, HAWQ
> can
> > automatically increase number of vseg to a very high number if you set
> > those limit gucs high as well.
> >
> > Best,
> > Yi
> >
> > On Mon, Jan 23, 2017 at 12:30 PM, Jon Roberts <[email protected]>
> wrote:
> >
> > > I've been thinking about these scenarios:
> > >
> > > 1.  Hash distributed tables with fixed number of buckets.
> > > If the tables were built using the defaults, buckets = 6 * number of
> > > nodes.  So you basically have 6 vsegs per host.  Multiply that by 16GB
> > and
> > > you only can use 96GB of the 256GB of RAM per node.
> > >
> > > 2.  A user has random tables but doesn't understand they can increase
> the
> > > number of vsegs.
> > > This will be common for users that come from Greenplum.  They again can
> > > only set statement member to 16GB so they are stuck with a max of 96GB
> of
> > > RAM usage.
> > >
> > > 3.  User increases vsegs and statement memory.
> > > Possibly run out of memory if too aggressive with settings.
> > >
> > > - I think we should be able to specify statement memory higher than
> 16GB.
> > > Maybe the limit should be something much higher such as 1TB.
> > >
> > > - The optimizer should limit the number of vsegs based on statement
> > memory
> > > setting to prevent OOM.  You could do the opposite too.  (limit memory
> > and
> > > use the vseg setting provided)  Greenplum can limit the amount of
> memory
> > > but we have two dials to adjust with vsegs and memory.
> > >
> > >
> > > Jon Roberts
> > >
> > > On Sun, Jan 22, 2017 at 5:20 PM, Yi Jin <[email protected]> wrote:
> > >
> > > > Hi Jon,
> > > >
> > > > That guc setting limit means for one virtual segment, the maximum
> > > > consumable memory is 16GB, for one segment/node, there maybe multiple
> > > vsegs
> > > > allocated to run queries, so if a node has 256gb expected to be
> > consumed
> > > by
> > > > HAWQ, it will have at most 16 vsegs running concurrently.
> > > >
> > > > hawq_rm_stmt_vseg_memory is for setting statement level vseg memory
> > > > consumption, it is required to specify hawq_rm_stmt_nvseg as well,
> only
> > > > when hawq_rm_stmt_nvseg is greater than 0, hawq_rm_stmt_vseg_memory
> is
> > > > activated regardless the original target resource queue vseg resource
> > > quota
> > > > definition. For example, you can set hawq_rm_stmt_vseg_memory as
> 16gb,
> > > > hawq_rm_stmt_nvseg
> > > > as 256, if you have a cluster having 256gb * 16 nodes and your target
> > > > resource queue can use 100% cluster resource, you will have 16 vsegs
> > > > running per node to consume all memory resource for this query.
> > > >
> > > > Best,
> > > > Yi
> > > >
> > > > On Sat, Jan 21, 2017 at 3:40 PM, Lei Chang <[email protected]>
> > wrote:
> > > >
> > > > > hawq_rm_stmt_vseg_memory and hawq_rm_stmt_nvseg need to be used
> > > together
> > > > to
> > > > > set the specific number of segments and the vseg memory. And
> > > > > hawq_rm_stmt_nvseg should be less than
> hawq_rm_nvseg_perquery_perseg_
> > > > > limit.
> > > > >
> > > > > set hawq_rm_stmt_vseg_memory = '2GB';set hawq_rm_stmt_nvseg = 6;
> > > > >
> > > > > looks 16GB is somewhat small for big dedicated machines: if 16GB is
> > per
> > > > > virtual segment memory, if 8 segment is used, it only use 128GB.
> > > > >
> > > > > Cheers
> > > > > Lei
> > > > >
> > > > >
> > > > > On Fri, Jan 20, 2017 at 9:11 PM, Jon Roberts <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Why is there a limit of 16GB for hawq_rm_stmt_vseg_memory?  A
> > cluster
> > > > > with
> > > > > > 256GB per node and dedicated for HAWQ may certainly want to
> utilize
> > > > more
> > > > > > memory per segment.  Is there something I'm missing regarding
> > > statement
> > > > > > memory?
> > > > > >
> > > > > > Secondly, does the number of vsegs for a query get influenced by
> > the
> > > > > > statement memory or does it just look at the plan and
> > > > > > hawq_rm_nvseg_perquery_perseg_limit?
> > > > > >
> > > > > >
> > > > > > Jon Roberts
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to