lidavidm commented on code in PR #13873:
URL: https://github.com/apache/arrow/pull/13873#discussion_r950219203
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
Review Comment:
Is there a more realistic example of a setting you would want to customize?
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
Review Comment:
Was it closing the clients that was the issue, or creating new ones?
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
+
+Limiting DoPut Batch Size
+--------------------------
+
+You may wish to limit the maximum size a client can submit to a server through
+DoPut, to prevent a request from taking up too much memory on the server. On
+the client-side, set `write_size_limit_bytes` on `FlightClientOptions`. On the
+server-side, set the gRPC option `GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH`.
+The client-side option will return an error that can be retried with smaller
batches,
+while the server-side limit will close out the connection. Setting both can be
wise, since
+the former provides a better user experience but the latter may be necessary
to defend
+against impolite clients.
+
+Closing unresponsive connections
+--------------------------------
+
+1. A stale connection can be closed using
+ :member:`arrow::flight::FlightClientOptions::stop_token`. This requires
recording the
+ stop token at connection establishment time.
+
+ .. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ .. tab-item:: Java
+
+ .. code-block:: java
+
+ .. tab-item:: Python
+
+ .. code-block:: python
+
+
+2. Use client timeout.
Review Comment:
```suggestion
2. Use call timeouts. (This is a general gRPC best practice.)
```
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
Review Comment:
```suggestion
For debugging, certain environment variables enable logging in gRPC. For
example, ``env GRPC_VERBOSITY=info GRPC_TRACE=http`` will print the initial
headers (on both sides) so you can see if gRPC established the connection or
not. It will also print when a message is sent, so you can tell if the
connection is open or not.
```
And link to https://grpc.github.io/grpc/cpp/md_doc_environment_variables.html
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
Review Comment:
```suggestion
A quick way of testing: attach to the process with a debugger and call
malloc_trim, or call ReleaseUnused on the system pool. If memory usage
drops, then likely, there is memory allocated by gRPC or by the
application that the system allocator was holding on to. This can be
adjusted in platform-specific ways; see an investigation in JIRA for
an example of how this works on Linux/glibc.
```
and link the JIRA
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
+
+Limiting DoPut Batch Size
Review Comment:
This is more like a Cookbook kind of thing IMO
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
Review Comment:
This might belong better in the discussion of memory management below
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
Review Comment:
Give an example?
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
+
+Limiting DoPut Batch Size
+--------------------------
+
+You may wish to limit the maximum size a client can submit to a server through
+DoPut, to prevent a request from taking up too much memory on the server. On
+the client-side, set `write_size_limit_bytes` on `FlightClientOptions`. On the
Review Comment:
General comment: reST uses double-backticks
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
Review Comment:
`:func:`-reference the appropriate API docs?
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
Review Comment:
I'll try to dig up the reference for this
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
Review Comment:
```suggestion
threads are not necessarily cleaned up (a "cached thread pool" in Java
parlance).
```
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
Review Comment:
```suggestion
Memory management
```
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
+
+Limiting DoPut Batch Size
+--------------------------
+
+You may wish to limit the maximum size a client can submit to a server through
+DoPut, to prevent a request from taking up too much memory on the server. On
+the client-side, set `write_size_limit_bytes` on `FlightClientOptions`. On the
+server-side, set the gRPC option `GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH`.
+The client-side option will return an error that can be retried with smaller
batches,
+while the server-side limit will close out the connection. Setting both can be
wise, since
+the former provides a better user experience but the latter may be necessary
to defend
+against impolite clients.
+
+Closing unresponsive connections
+--------------------------------
+
+1. A stale connection can be closed using
+ :member:`arrow::flight::FlightClientOptions::stop_token`. This requires
recording the
+ stop token at connection establishment time.
+
+ .. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ .. tab-item:: Java
+
+ .. code-block:: java
+
+ .. tab-item:: Python
+
+ .. code-block:: python
+
+
+2. Use client timeout.
+ .. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ .. tab-item:: Java
+
+ .. code-block:: java
+
+ .. tab-item:: Python
+
+ .. code-block:: python
+
+3. There is a long standing ticket for a per-write/per-read timeout instead of
a per
Review Comment:
This first sentence might be better as a `.. note` since it's not actually
actionable
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
+
+Limiting DoPut Batch Size
+--------------------------
+
+You may wish to limit the maximum size a client can submit to a server through
+DoPut, to prevent a request from taking up too much memory on the server. On
+the client-side, set `write_size_limit_bytes` on `FlightClientOptions`. On the
+server-side, set the gRPC option `GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH`.
+The client-side option will return an error that can be retried with smaller
batches,
+while the server-side limit will close out the connection. Setting both can be
wise, since
+the former provides a better user experience but the latter may be necessary
to defend
+against impolite clients.
+
+Closing unresponsive connections
+--------------------------------
+
+1. A stale connection can be closed using
+ :member:`arrow::flight::FlightClientOptions::stop_token`. This requires
recording the
+ stop token at connection establishment time.
+
+ .. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ .. tab-item:: Java
+
+ .. code-block:: java
+
+ .. tab-item:: Python
+
+ .. code-block:: python
+
+
+2. Use client timeout.
+ .. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ .. tab-item:: Java
+
+ .. code-block:: java
+
+ .. tab-item:: Python
+
+ .. code-block:: python
+
+3. There is a long standing ticket for a per-write/per-read timeout instead of
a per
+ call timeout (ARROW-6062_), but this is not (easily) possible to implement
with the
+ blocking gRPC API. For now one can also do something like set up a
background thread
Review Comment:
This sort of thing should definitely be a recipe.
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
+
+Limiting DoPut Batch Size
+--------------------------
+
+You may wish to limit the maximum size a client can submit to a server through
+DoPut, to prevent a request from taking up too much memory on the server. On
+the client-side, set `write_size_limit_bytes` on `FlightClientOptions`. On the
+server-side, set the gRPC option `GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH`.
+The client-side option will return an error that can be retried with smaller
batches,
+while the server-side limit will close out the connection. Setting both can be
wise, since
+the former provides a better user experience but the latter may be necessary
to defend
+against impolite clients.
+
+Closing unresponsive connections
+--------------------------------
+
+1. A stale connection can be closed using
+ :member:`arrow::flight::FlightClientOptions::stop_token`. This requires
recording the
+ stop token at connection establishment time.
+
+ .. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ .. tab-item:: Java
+
+ .. code-block:: java
+
+ .. tab-item:: Python
+
+ .. code-block:: python
+
+
+2. Use client timeout.
+ .. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ .. tab-item:: Java
+
+ .. code-block:: java
+
+ .. tab-item:: Python
+
+ .. code-block:: python
+
+3. There is a long standing ticket for a per-write/per-read timeout instead of
a per
+ call timeout (ARROW-6062_), but this is not (easily) possible to implement
with the
+ blocking gRPC API. For now one can also do something like set up a
background thread
+ that calls cancel() on a timer and have the main thread reset the timer
every time a
+ write operation completes successfully (that means one needs to use
to_batches() +
+ write_batch and not write_table).
Review Comment:
```suggestion
3. Client timeouts are not great for long-running streaming calls, where it
may
be hard to choose a timeout for the entire operation. Instead, what is
often
desired is a per-read or per-write timeout so that the operation fails if
it
isn't making progress. This can be implemented with a background thread
that
calls cancel() on a timer, with the main thread resetting the timer every
time
an operation completes successfully. For a fully-worked out example, see
the
Cookbook.
.. note:: There is a long standing ticket for a per-write/per-read timeout
instead of a per call timeout (ARROW-6062_), but this is not
(easily)
possible to implement with the blocking gRPC API.
```
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+* bounded thread pool -> Reject connections / requests when under load, and
have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
Review Comment:
I'd probably word this like ~gRPC's default behavior allows one server to
accept many connections from many different clients, but if requests do a lot
of work (as they may under Flight), the server may not be able to keep up.
Configuring the server to limit the number of clients and reject requests more
proactively, and configuring clients to retry with backoff (and potentially
connect to a different node), would give more consistent quality of service.
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,110 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+ options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH,
4);
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+Round robin balancing can cause every client to have an open connection to
+every server causing an unexpected number of open connections and a depletion
+of resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+Note: "connect" isn't really a connect and we’ve observed that gRPC does not
+give you the actual error until you first try to make a call. This can cause
+error being reported at unexpected times.
+
+Use ListFlights sparingly
+-------------------------
+
+ListFlights endpoint is largely just implemented as a normal GRPC stream
+endpoint and can hit transfer bottlenecks if used too much. To estimate data
+transfer bottleneck:
+5k schemas will serialize to about 1-5 MB/call. Assuming a gRPC localhost
+bottleneck of 3GB/s you can at best serve 600-3000 clients/s.
+
+ARROW-15764_ proposes a caching optimisation for server side, but it was not
+yet implemented.
+
+
+Memory cache client-side
+------------------------
+
+Flight uses gRPC allocator wherever possible.
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
+threads are not necessarily cleaned up (cached thread pool in java parlance).
+glibc malloc clears some per thread state and the default tuning never clears
+caches in some workloads. But you can explicitly tell malloc to dump caches.
+See ARROW-16697_ as an example.
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+
+
+Excessive traffic
+-----------------
+
+There are basically two ways to handle excessive traffic:
+* unbounded thread pool -> everyone gets serviced, but it might take forever.
+This is what you are seeing now.
+bounded thread pool -> Reject connections / requests when under load, and have
+clients retry with backoff. This also gives an opportunity to retry with a
+different node. Not everyone gets serviced but quality of service stays
consistent.
+
+Closing unresponsive connections
+--------------------------------
+
+1. A stale connection can be closed using
+ :member:`arrow::flight::FlightClientOptions::stop_token`. This requires
recording the
+ stop token at connection establishment time.
Review Comment:
I would much rather they be cookbook samples so that they actually get run
in CI. It's probably good to still have a short snippet here, but I worry about
those kinds of things bit-rotting without them actually getting exercised like
the Cookbook does.
##########
docs/source/cpp/flight.rst:
##########
@@ -172,6 +172,154 @@ request/response. On the server, they can inspect
incoming headers and
fail the request; hence, they can be used to implement custom
authentication methods.
+.. _flight-best-practices:
+
+Best practices
+==============
+
+gRPC
+----
+
+When using default gRPC transport options can be passed to it via
+:member:`arrow::flight::FlightClientOptions::generic_options`. For example:
+
+.. tab-set::
+
+ .. tab-item:: C++
+
+ .. code-block:: cpp
+
+ auto options = FlightClientOptions::Defaults();
+ // Set a very low limit at the gRPC layer to fail all calls
+
options.generic_options.emplace_back(GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH, 4);
+
+ .. tab-item:: Python
+
+ .. code-block:: cpp
+
+ // Set a very low limit at the gRPC layer to fail all calls
+ generic_options = [("GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH", 4)]
+ client = pyarrow.flight.FlightClient(server_uri,
generic_options=generic_options)
+
+Also see `best gRPC practices`_ and available `gRPC keys`_.
+
+Re-use clients whenever possible
+--------------------------------
+
+Closing clients causes gRPC to close and clean up connections which can take
+several seconds per connection. This will stall server and client threads if
+done too frequently. Client reuse will avoid this issue.
+
+Don’t round-robin load balance
+------------------------------
+
+`Round robin balancing`_ means every client can have an open connection to
+every server, causing an unexpected number of open connections and depleting
+server resources.
+
+Debugging
+---------
+
+Use netstat to see the number of open connections.
+For debug - env GRPC_VERBOSITY=info GRPC_TRACE=http will print the initial
+headers (on both sides) so you can see if grpc established the connection or
+not. It will also print when a message is sent, so you can tell if the
+connection is open or not.
+gRPC may not report connection errors until a call is actually made.
+Hence, to detect connection errors when creating a client, some sort
+of dummy RPC should be made.
+
+Memory cache management
+-----------------------
+
+Flight tries to reuse allocations made by gRPC to avoid redundant
+data copies. However, this means that those allocations may not
+be tracked by the Arrow memory pool, and that memory usage behavior,
+such as whether free memory is returned to the system, is dependent
+on the allocator that gRPC uses (usually the system allocator).
+
+A quick way of testing: attach to the process with a debugger and call
malloc_trim
+or call ReleaseUnused on the system pool.
+
+Excessive traffic
+-----------------
+
+gRPC will spawn an unbounded number of threads for concurrent clients. Those
Review Comment:
This might not actually be true? e.g.
`https://github.com/grpc/grpc/issues/14492`
Ah, it's not actually _unbounded_, just a very large number:
https://github.com/grpc/proposal/blob/master/L30-cpp-control-max-threads-in-SyncServer.md
I think the threads are cleaned up now? So the next sentence may not be right
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]