Mark Broadmore created TINKERPOP-2352:
-----------------------------------------
Summary: Gremlin Python driver default pool size makes Gremlin
keep-alive difficult
Key: TINKERPOP-2352
URL: https://issues.apache.org/jira/browse/TINKERPOP-2352
Project: TinkerPop
Issue Type: Bug
Components: python
Affects Versions: 3.4.5, 3.3.5
Environment: AWS Lambda, Python 3.7 runtime, AWS Neptune.
(AWS Lambda functions can remain in memory and thus hold connections open for
many minutes between invocations)
Reporter: Mark Broadmore
I'm working with a Gremlin database that (like many) terminates connections if
they don't execute any transactions with a timeout period. When we want to run
a traversal we first check our `GraphTraversalSource` by running
`g.V().limit(1).count().next()` and if that raises an exception we know we need
to reconnect before running the actual traversal.
We've been very confused that this hasn't worked as expected: we intermittently
see traversals fail with `WebSocketClosed` or other connection-related errors
immediately after the "connection test" passes.
I've (finally) found the cause of this inconsistency is the default pool size
in `gremlin_python.driver.client.Client` being 4. This means there's no
visiblity outside the `Client` of which connection in the pool is tested and/or
used, and in fact no way for the application (`GraphTraversalSource`) to run
keep-alive type traversals reliably. Anytime an application passes in a pool
size of `None` or a number > 1 there'll be no way to make sure that each and
every connection in the pool actually sends keep-alive traversals to the
remote, _except_ in the case of a single-threaded application where a tight
loop could issue `pool_size` of them. In that latter case as the application
is single-threaded then a `pool_size` above 1 won't provide much benefit.
I've raised this as a bug because I think a default `pool_size` of 1 would give
much more predictable behaviour, and in the specific case of the Python driver
is probably more appropriate because Python applications tend to run
single-threaded by default, with multi-threading carefully added when
performance requires it. Perhaps it's a wish, but as the behaviour from the
default option is quite confusing it feels more like a bug, at least. If it
would help I'm happy to raise a PR with some updated function header comments
or maybe updated documentation about multi-threaded / multi-async-loop usage of
gremlin-python.
(This is my first issue here, apologies if it has some fields wrong.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)