[GitHub] [arrow] rok commented on a diff in pull request #13873: ARROW-17407: [Doc][FlightRPC] Flight/gRPC best practices

GitBox Fri, 19 Aug 2022 18:18:17 -0700


rok commented on code in PR #13873:
URL: https://github.com/apache/arrow/pull/13873#discussion_r950629152



##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect 
incoming headers and
 fail the request; hence, they can be used to implement custom
 authentication methods.
 
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+   .. tab-item:: C++
+
+      .. code-block:: cpp
+
+         auto options = FlightClientOptions::Defaults();
+         // Set a very low limit at the gRPC layer to fail all calls
+         
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+   .. tab-item:: Python
+
+      .. code-block:: cpp
+
+         // Set a very low limit at the gRPC layer to fail all calls
+         generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+         client = pyarrow.flight.FlightClient(server_uri, 
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies.  However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call 
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and 
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays 
consistent.

Review Comment:
   Switched to proposed text.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] rok commented on a diff in pull request #13873: ARROW-17407: [Doc][FlightRPC] Flight/gRPC best practices

Reply via email to