please see the comments inline
On Sat, Sep 17, 2016 at 3:07 AM, Kyle Dunn <kd...@pivotal.io> wrote:
> In an ongoing evaluation of HAWQ in Azure, we've encountered some
> sub-optimal network performance. It would be great to get some additional
> information about a few server parameters related to the network:
> - gp_max_packet_size
> The default is documented at 8192. Why was this number chosen? Should
> this value be aligned with the network infrastructure's configured MTU,
> accounting for the packet header size of the chosen interconnect type?
> (Azure only support MTU 1500 and has been showing better reliability using
> TCP in Greenplum)
8K is an empirical value when we evaluate the interconnect performance on
physical hardware. It is shown that 8K has the optimal performance.
But on Azure, it is not benchmarked, looks like udp on azure is not stable.
you can set "gp_interconnect_log_stats" to see the statistics about the
queries. And you can also use ifconfig to see the errors about packets.
If the network is not stable, it deserves a try to decrease the value to
less than 1500 to align the user space packet size with maximal kernel
packet size. But Decreasing the value increases the cpu cost
for marshaling/unmarshalling the packets. There will be a tradeoff here.
> - gp_interconnect_type
> The docs claim UDPIFC is the default, UDP is the observed default. Do
> the recommendations around which setting to use vary in an IaaS environment
> (AWS or Azure)?
which doc? when we release UDPIFC for gpdb, we kept old UDP and added
UDPIFC to avoid potential regressions since there are a lot of UDP
deployments for gpdb at that time. After UDPIFC was released, it is shown
UDPIFC is much more stable and perform better than UDP. So when we release
hawq, we just replaced UDP with UDPIFC. But use UDP for the name. So UDP is
UDPIFC in HAWQ.
There are two flow control methods in UDPIFC, I'd like suggest you have a
try: Gp_interconnect_fc_method (INTERCONNECT_FC_METHOD_CAPACITY &
> - gp_interconnect_queue_depth
> My naive read of this is performance can be traded off for (potentially
> significant) RAM utilization. Is there additional detail around turning
> this knob? How does the interaction between this and the underlying NIC
> queue depth affect performance? As an example, in Azure, disabling TX
> queuing (ifconfig eth0 txqueue 0) on the virtual NIC improved benchmark
> performance, as the underlying HyperV host is doing it's own queuing
This queue is application level queue, and use for caching, handling
out-of-order and lost packets.
According to our past performance testing on physical hardware, increasing
it to a large value does not show a lot of benefits. Too small value does
impact performance. But it needs more testing on Azure I think.
> *Kyle Dunn | Data Engineering | Pivotal*
> Direct: 303.905.3171 <3039053171> | Email: kd...@pivotal.io