Note some L2 tunables may quite depend on NIC driver the virtual machine (vm) is using.
e.g. for PCI SR-IOV vf or PCI assignment, the NIC in a vm behaves like a physical NIC, some L2 tuneables may be set to usual default values, however for some virtual NIC implemented as para-virtulization, those tunables (e.g. tx queue len, or disable/enable nic offloading technique e.g. gso, tso) with other values probably are better. 2016-09-17 12:52 GMT+08:00 Lei Chang <lei_ch...@apache.org>: > Here is some more information around hawq interconnect. But NOTE that the > default value tuning is all on *physical* hardware and not on Azure. On > amazon and vmware, looks all default settings work fine. > > · gp_interconnect_type: Sets the protocol used for inter-node > communication. Valid values are "tcp", "udp" “udp” is the new udp > interconnect implementation with flow control. Default value is “udp”. > > · gp_interconnect_fc_method: Sets the flow control method used for > UDP interconnect. Valid values are "capacity" and "loss". For “capacity” > based flow control, senders do not send packets when receivers do not have > capacity. “Loss” based flow control is based on “capacity” based flow > control, and it also tunes sending speed according to packet losses. > Default value is “loss”. > > · gp_interconnect_snd_queue_depth: A new parameter used to specify > the average size of a send queue. The buffer pool size for each send > process can be calculated by using gp_interconnect_snd_queue_depth * > number > of processes in the downstream gang. The default value is 2. > > · gp_interconnect_cache_future_packets: A new parameter used to > control whether future packets are cached at receiver side. Default value > is “true” > > · gp_udp_bufsize_k: gp_udp_bufsize_k is changed from “PGC_SUSET” to > “PGC_BACKEND” to make customer customize the size of socket buffers used by > interconnect. And the maximal value of is changed to 32768KB = 32M. > > > For UDP interconnect, end users should tune the OS kernel memory used by > sockets. On Linux, these are > > · net.core.rmem_max > > · net.core.wmem_max > > · txqueuelen (Transmit Queue Length) > > Recommended values for net.core.rmem(wmem)_max are 2M (or greater). And the > txqueuelen can be increased if OS introduces some packets losses due to > kernel ring buffer overflow. If the number of nodes is large, users should > pay attention to the queue depth and socket buffer size settings to avoid > potential packets losses due to a small OS buffer size. > > > On Sat, Sep 17, 2016 at 12:44 PM, Lei Chang <lei_ch...@apache.org> wrote: > > > please see the comments inline > > > > On Sat, Sep 17, 2016 at 3:07 AM, Kyle Dunn <kd...@pivotal.io> wrote: > > > >> In an ongoing evaluation of HAWQ in Azure, we've encountered some > >> sub-optimal network performance. It would be great to get some > additional > >> information about a few server parameters related to the network: > >> > >> - gp_max_packet_size > >> The default is documented at 8192. Why was this number chosen? Should > >> this value be aligned with the network infrastructure's configured MTU, > >> accounting for the packet header size of the chosen interconnect type? > >> (Azure only support MTU 1500 and has been showing better reliability > >> using > >> TCP in Greenplum) > >> > > > > 8K is an empirical value when we evaluate the interconnect performance on > > physical hardware. It is shown that 8K has the optimal performance. > > > > But on Azure, it is not benchmarked, looks like udp on azure is not > > stable. you can set "gp_interconnect_log_stats" to see the statistics > > about the queries. And you can also use ifconfig to see the errors about > > packets. > > > > If the network is not stable, it deserves a try to decrease the value to > > less than 1500 to align the user space packet size with maximal kernel > > packet size. But Decreasing the value increases the cpu cost > > for marshaling/unmarshalling the packets. There will be a tradeoff here. > > > > > >> > >> - gp_interconnect_type > >> The docs claim UDPIFC is the default, UDP is the observed default. > Do > >> the recommendations around which setting to use vary in an IaaS > >> environment > >> (AWS or Azure)? > >> > > > > which doc? when we release UDPIFC for gpdb, we kept old UDP and added > > UDPIFC to avoid potential regressions since there are a lot of UDP > > deployments for gpdb at that time. After UDPIFC was released, it is shown > > UDPIFC is much more stable and perform better than UDP. So when we > release > > hawq, we just replaced UDP with UDPIFC. But use UDP for the name. So UDP > is > > UDPIFC in HAWQ. > > > > There are two flow control methods in UDPIFC, I'd like suggest you have a > > try: Gp_interconnect_fc_method (INTERCONNECT_FC_METHOD_CAPACITY & > > INTERCONNECT_FC_METHOD_LOSS). > > > > > >> - gp_interconnect_queue_depth > >> My naive read of this is performance can be traded off for > (potentially > >> significant) RAM utilization. Is there additional detail around turning > >> this knob? How does the interaction between this and the underlying NIC > >> queue depth affect performance? As an example, in Azure, disabling TX > >> queuing (ifconfig eth0 txqueue 0) on the virtual NIC improved benchmark > >> performance, as the underlying HyperV host is doing it's own queuing > >> anyway. > >> > >> > > This queue is application level queue, and use for caching, handling > > out-of-order and lost packets. > > > > According to our past performance testing on physical hardware, > increasing > > it to a large value does not show a lot of benefits. Too small value does > > impact performance. But it needs more testing on Azure I think. > > > > > >> > >> Thanks, > >> Kyle > >> -- > >> *Kyle Dunn | Data Engineering | Pivotal* > >> Direct: 303.905.3171 <3039053171> | Email: kd...@pivotal.io > >> > > > > >