[ 
https://issues.apache.org/jira/browse/TINKERPOP-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717765#comment-17717765
 ] 

ASF GitHub Bot commented on TINKERPOP-2820:
-------------------------------------------

vkagamlyk merged PR #2051:
URL: https://github.com/apache/tinkerpop/pull/2051




> gremlin-python _close_session race condition/FD leak
> ----------------------------------------------------
>
>                 Key: TINKERPOP-2820
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2820
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 3.6.1
>            Reporter: Alex Hamilton
>            Assignee: Valentyn Kahamlyk
>            Priority: Critical
>             Fix For: 3.7.0, 3.6.3, 3.5.6
>
>
> There is a race condition in gremlin-python when closing session-based 
> connections that results in leaking file descriptors for event loops - 
> eventually leading to an `OSError [Errno 24] too many open files` error after 
> enough transactions occur.
> The problem stems from a race condition when closing session based 
> connections that causes the event loop opened for the session's connection to 
> be left open.
> The problem is completely contained in these two methods from 
> `gremlin_python.driver.client.py`
> ```py
> def close(self):
>     # prevent the Client from being closed more than once. it raises errors 
> if new jobby jobs
>     # get submitted to the executor when it is shutdown
>     if self._closed:
>         return
>     if self._session_enabled:
>         self._close_session() # 1. (see below)
>     log.info("Closing Client with url '%s'", self._url)
>     while not self._pool.empty(): # 3. (see below)
>         conn = self._pool.get(True)
>         conn.close()
>     self._executor.shutdown()
>     self._closed = True
> def _close_session(self):
>     message = request.RequestMessage(
>         processor='session', op='close',
>         args=\{'session': str(self._session)})
>     conn = self._pool.get(True) 
>     return conn.write(message).result() # 2. (see below)
> ```
> 1. `_close_session()` called
> 2. `.result()` waits for the _write_ to finish, but does *not* wait for the 
> _read_ to finish. `conn` does not get put back into `self._pool` until AFTER 
> the read finishes (`gremlin_python.driver.connection.Connection._receive()`). 
> However, this method returns early and goes to 3.
> 3. this while loop is not entered to close out the connections. This leaves 
> the conn's event loop running, never to be closed.
> I was able to solve this by modifying `_close_session` as follows:
> ```py
> def _close_session(self):
>     message = request.RequestMessage(
>         processor='session', op='close',
>         args=\{'session': str(self._session)})
>     conn = self._pool.get(True)
>     try:
>         write_result_set = conn.write(message).result()
>         return write_result_set.all().result() # wait for _receive() to finish
>     except protocol.GremlinServerError:
>         pass
> ```
> I'm not sure if this is the correct solution, but wanted to point out the bug.
> In the meantime however, I wrote a context manager to handle this cleanup for 
> me
> ```py
> @contextlib.contextmanager
> def transaction():
>     tx = g.tx()
>     gtx = tx.begin()
>     try:
>         yield tx, gtx
>         tx.commit()
>     except Exception as e:
>         tx.rollback()
>     finally:
>         while not tx._session_based_connection._client._pool.empty():
>             conn = tx._session_based_connection._client._pool.get(True)
>             conn.close()
>             logger.info("Closed abandoned session connection")
> with transaction() as (tx, gtx):
>     foo = gtx.some_traversal().to_list()
>     # do something with foo
>     gtx.some_other_traversal().iterate()
> ```
> Cheers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to