Re: Network interconnect settings in IaaS environments

Paul Guo Sat, 17 Sep 2016 05:51:52 -0700

Note some L2 tunables may quite depend on NIC driver the virtual machine
(vm) is using.


e.g. for PCI SR-IOV vf or PCI assignment, the NIC in a vm behaves like a
physical NIC,
some L2 tuneables may be set to usual default values, however for some
virtual NIC
implemented as para-virtulization, those tunables (e.g. tx queue len, or
disable/enable
nic offloading technique e.g. gso, tso) with other values probably are
better.



2016-09-17 12:52 GMT+08:00 Lei Chang <lei_ch...@apache.org>:

> Here is some more information around hawq interconnect. But NOTE that the
> default value tuning is all on *physical* hardware and not on Azure. On
> amazon and vmware, looks all default settings work fine.
>
> ·       gp_interconnect_type: Sets the protocol used for inter-node
> communication. Valid values are "tcp", "udp" “udp” is the new udp
> interconnect implementation with flow control. Default value is “udp”.
>
> ·       gp_interconnect_fc_method: Sets the flow control method used for
> UDP interconnect. Valid values are "capacity" and "loss". For “capacity”
> based flow control, senders do not send packets when receivers do not have
> capacity. “Loss” based flow control is based on “capacity” based flow
> control, and it also tunes sending speed according to packet losses.
> Default value is “loss”.
>
> ·       gp_interconnect_snd_queue_depth: A new parameter used to specify
> the average size of a send queue. The buffer pool size for each send
> process can be calculated by using gp_interconnect_snd_queue_depth *
> number
> of processes in the downstream gang. The default value is 2.
>
> ·       gp_interconnect_cache_future_packets: A new parameter used to
> control whether future packets are cached at receiver side. Default value
> is “true”
>
> ·       gp_udp_bufsize_k: gp_udp_bufsize_k is changed from “PGC_SUSET” to
> “PGC_BACKEND” to make customer customize the size of socket buffers used by
> interconnect. And the maximal value of is changed to 32768KB = 32M.
>
>
> For UDP interconnect, end users should tune the OS kernel memory used by
> sockets. On Linux, these are
>
> ·       net.core.rmem_max
>
> ·       net.core.wmem_max
>
> ·       txqueuelen (Transmit Queue Length)
>
> Recommended values for net.core.rmem(wmem)_max are 2M (or greater). And the
> txqueuelen can be increased if OS introduces some packets losses due to
> kernel ring buffer overflow. If the number of nodes is large, users should
> pay attention to the queue depth and socket buffer size settings to avoid
> potential packets losses due to a small OS buffer size.
>
>
> On Sat, Sep 17, 2016 at 12:44 PM, Lei Chang <lei_ch...@apache.org> wrote:
>
> > please see the comments inline
> >
> > On Sat, Sep 17, 2016 at 3:07 AM, Kyle Dunn <kd...@pivotal.io> wrote:
> >
> >> In an ongoing evaluation of HAWQ in Azure, we've encountered some
> >> sub-optimal network performance. It would be great to get some
> additional
> >> information about a few server parameters related to the network:
> >>
> >> - gp_max_packet_size
> >>    The default is documented at 8192. Why was this number chosen? Should
> >> this value be aligned with the network infrastructure's configured MTU,
> >> accounting for the packet header size of the chosen interconnect type?
> >>  (Azure only support MTU 1500 and has been showing better reliability
> >> using
> >> TCP in Greenplum)
> >>
> >
> > 8K is an empirical value when we evaluate the interconnect performance on
> > physical hardware. It is shown that 8K has the optimal performance.
> >
> > But on Azure, it is not benchmarked, looks like udp on azure is not
> > stable. you can set "gp_interconnect_log_stats" to see the statistics
> > about the queries. And you can also use ifconfig to see the errors about
> > packets.
> >
> > If the network is not stable, it deserves a try to decrease the value to
> > less than 1500 to align the user space packet size with maximal kernel
> > packet size. But Decreasing the value increases the cpu cost
> > for marshaling/unmarshalling the packets. There will be a tradeoff here.
> >
> >
> >>
> >> - gp_interconnect_type
> >>     The docs claim UDPIFC is the default, UDP is the observed default.
> Do
> >> the recommendations around which setting to use vary in an IaaS
> >> environment
> >> (AWS or Azure)?
> >>
> >
> > which doc? when we release UDPIFC for gpdb, we kept old UDP and added
> > UDPIFC to avoid potential regressions since there are a lot of UDP
> > deployments for gpdb at that time. After UDPIFC was released, it is shown
> > UDPIFC is much more stable and perform better than UDP. So when we
> release
> > hawq, we just replaced UDP with UDPIFC. But use UDP for the name. So UDP
> is
> > UDPIFC in HAWQ.
> >
> > There are two flow control methods in UDPIFC, I'd like suggest you have a
> > try: Gp_interconnect_fc_method (INTERCONNECT_FC_METHOD_CAPACITY &
> > INTERCONNECT_FC_METHOD_LOSS).
> >
> >
> >> - gp_interconnect_queue_depth
> >>    My naive read of this is performance can be traded off for
> (potentially
> >> significant) RAM utilization. Is there additional detail around turning
> >> this knob? How does the interaction between this and the underlying NIC
> >> queue depth affect performance? As an example, in Azure, disabling TX
> >> queuing (ifconfig eth0 txqueue 0) on the virtual NIC improved benchmark
> >> performance, as the underlying HyperV host is doing it's own queuing
> >> anyway.
> >>
> >>
> > This queue is application level queue, and use for caching, handling
> > out-of-order and lost packets.
> >
> > According to our past performance testing on physical hardware,
> increasing
> > it to a large value does not show a lot of benefits. Too small value does
> > impact performance. But it needs more testing on Azure I think.
> >
> >
> >>
> >> Thanks,
> >> Kyle
> >> --
> >> *Kyle Dunn | Data Engineering | Pivotal*
> >> Direct: 303.905.3171 <3039053171> | Email: kd...@pivotal.io
> >>
> >
> >
>

Re: Network interconnect settings in IaaS environments

Reply via email to