[ https://issues.apache.org/jira/browse/TINKERPOP-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715991#comment-17715991 ]
ASF GitHub Bot commented on TINKERPOP-2820: ------------------------------------------- codecov-commenter commented on PR #2042: URL: https://github.com/apache/tinkerpop/pull/2042#issuecomment-1520834502 ## [Codecov](https://codecov.io/gh/apache/tinkerpop/pull/2042?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#2042](https://codecov.io/gh/apache/tinkerpop/pull/2042?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ef4d4f7) into [3.5-dev](https://codecov.io/gh/apache/tinkerpop/commit/2217db0d82160c469f30607fc9492d6553a57190?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2217db0) will **decrease** coverage by `5.24%`. > The diff coverage is `n/a`. ```diff @@ Coverage Diff @@ ## 3.5-dev #2042 +/- ## ============================================= - Coverage 69.39% 64.16% -5.24% ============================================= Files 866 25 -841 Lines 41242 3750 -37492 Branches 5442 0 -5442 ============================================= - Hits 28621 2406 -26215 + Misses 10716 1166 -9550 + Partials 1905 178 -1727 ``` [see 841 files with indirect coverage changes](https://codecov.io/gh/apache/tinkerpop/pull/2042/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) > gremlin-python _close_session race condition/FD leak > ---------------------------------------------------- > > Key: TINKERPOP-2820 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2820 > Project: TinkerPop > Issue Type: Bug > Components: python > Affects Versions: 3.6.1 > Reporter: Alex Hamilton > Assignee: Valentyn Kahamlyk > Priority: Critical > > There is a race condition in gremlin-python when closing session-based > connections that results in leaking file descriptors for event loops - > eventually leading to an `OSError [Errno 24] too many open files` error after > enough transactions occur. > The problem stems from a race condition when closing session based > connections that causes the event loop opened for the session's connection to > be left open. > The problem is completely contained in these two methods from > `gremlin_python.driver.client.py` > ```py > def close(self): > # prevent the Client from being closed more than once. it raises errors > if new jobby jobs > # get submitted to the executor when it is shutdown > if self._closed: > return > if self._session_enabled: > self._close_session() # 1. (see below) > log.info("Closing Client with url '%s'", self._url) > while not self._pool.empty(): # 3. (see below) > conn = self._pool.get(True) > conn.close() > self._executor.shutdown() > self._closed = True > def _close_session(self): > message = request.RequestMessage( > processor='session', op='close', > args=\{'session': str(self._session)}) > conn = self._pool.get(True) > return conn.write(message).result() # 2. (see below) > ``` > 1. `_close_session()` called > 2. `.result()` waits for the _write_ to finish, but does *not* wait for the > _read_ to finish. `conn` does not get put back into `self._pool` until AFTER > the read finishes (`gremlin_python.driver.connection.Connection._receive()`). > However, this method returns early and goes to 3. > 3. this while loop is not entered to close out the connections. This leaves > the conn's event loop running, never to be closed. > I was able to solve this by modifying `_close_session` as follows: > ```py > def _close_session(self): > message = request.RequestMessage( > processor='session', op='close', > args=\{'session': str(self._session)}) > conn = self._pool.get(True) > try: > write_result_set = conn.write(message).result() > return write_result_set.all().result() # wait for _receive() to finish > except protocol.GremlinServerError: > pass > ``` > I'm not sure if this is the correct solution, but wanted to point out the bug. > In the meantime however, I wrote a context manager to handle this cleanup for > me > ```py > @contextlib.contextmanager > def transaction(): > tx = g.tx() > gtx = tx.begin() > try: > yield tx, gtx > tx.commit() > except Exception as e: > tx.rollback() > finally: > while not tx._session_based_connection._client._pool.empty(): > conn = tx._session_based_connection._client._pool.get(True) > conn.close() > logger.info("Closed abandoned session connection") > with transaction() as (tx, gtx): > foo = gtx.some_traversal().to_list() > # do something with foo > gtx.some_other_traversal().iterate() > ``` > Cheers -- This message was sent by Atlassian Jira (v8.20.10#820010)