please see the comments inline On Sat, Sep 17, 2016 at 3:07 AM, Kyle Dunn <[email protected]> wrote:
> In an ongoing evaluation of HAWQ in Azure, we've encountered some > sub-optimal network performance. It would be great to get some additional > information about a few server parameters related to the network: > > - gp_max_packet_size > The default is documented at 8192. Why was this number chosen? Should > this value be aligned with the network infrastructure's configured MTU, > accounting for the packet header size of the chosen interconnect type? > (Azure only support MTU 1500 and has been showing better reliability using > TCP in Greenplum) > 8K is an empirical value when we evaluate the interconnect performance on physical hardware. It is shown that 8K has the optimal performance. But on Azure, it is not benchmarked, looks like udp on azure is not stable. you can set "gp_interconnect_log_stats" to see the statistics about the queries. And you can also use ifconfig to see the errors about packets. If the network is not stable, it deserves a try to decrease the value to less than 1500 to align the user space packet size with maximal kernel packet size. But Decreasing the value increases the cpu cost for marshaling/unmarshalling the packets. There will be a tradeoff here. > > - gp_interconnect_type > The docs claim UDPIFC is the default, UDP is the observed default. Do > the recommendations around which setting to use vary in an IaaS environment > (AWS or Azure)? > which doc? when we release UDPIFC for gpdb, we kept old UDP and added UDPIFC to avoid potential regressions since there are a lot of UDP deployments for gpdb at that time. After UDPIFC was released, it is shown UDPIFC is much more stable and perform better than UDP. So when we release hawq, we just replaced UDP with UDPIFC. But use UDP for the name. So UDP is UDPIFC in HAWQ. There are two flow control methods in UDPIFC, I'd like suggest you have a try: Gp_interconnect_fc_method (INTERCONNECT_FC_METHOD_CAPACITY & INTERCONNECT_FC_METHOD_LOSS). > - gp_interconnect_queue_depth > My naive read of this is performance can be traded off for (potentially > significant) RAM utilization. Is there additional detail around turning > this knob? How does the interaction between this and the underlying NIC > queue depth affect performance? As an example, in Azure, disabling TX > queuing (ifconfig eth0 txqueue 0) on the virtual NIC improved benchmark > performance, as the underlying HyperV host is doing it's own queuing > anyway. > > This queue is application level queue, and use for caching, handling out-of-order and lost packets. According to our past performance testing on physical hardware, increasing it to a large value does not show a lot of benefits. Too small value does impact performance. But it needs more testing on Azure I think. > > Thanks, > Kyle > -- > *Kyle Dunn | Data Engineering | Pivotal* > Direct: 303.905.3171 <3039053171> | Email: [email protected] >
