[ https://issues.apache.org/jira/browse/TINKERPOP-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717307#comment-17717307 ]
ASF GitHub Bot commented on TINKERPOP-2820: ------------------------------------------- codecov-commenter commented on PR #2051: URL: https://github.com/apache/tinkerpop/pull/2051#issuecomment-1526060171 ## [Codecov](https://codecov.io/gh/apache/tinkerpop/pull/2051?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#2051](https://codecov.io/gh/apache/tinkerpop/pull/2051?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (63cee08) into [3.5-dev](https://codecov.io/gh/apache/tinkerpop/commit/6045c11deb37b2876bde5186975a8546d6a83c87?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6045c11) will **increase** coverage by `0.55%`. > The diff coverage is `n/a`. > :exclamation: Current head 63cee08 differs from pull request most recent head f996d9f. Consider uploading reports for the commit f996d9f to get more accurate results ```diff @@ Coverage Diff @@ ## 3.5-dev #2051 +/- ## ============================================= + Coverage 69.33% 69.89% +0.55% Complexity 8971 8971 ============================================= Files 866 841 -25 Lines 41242 37492 -3750 Branches 5442 5442 ============================================= - Hits 28597 26204 -2393 + Misses 10735 9556 -1179 + Partials 1910 1732 -178 ``` [see 33 files with indirect coverage changes](https://codecov.io/gh/apache/tinkerpop/pull/2051/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) > gremlin-python _close_session race condition/FD leak > ---------------------------------------------------- > > Key: TINKERPOP-2820 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2820 > Project: TinkerPop > Issue Type: Bug > Components: python > Affects Versions: 3.6.1 > Reporter: Alex Hamilton > Assignee: Valentyn Kahamlyk > Priority: Critical > Fix For: 3.7.0, 3.6.3, 3.5.6 > > > There is a race condition in gremlin-python when closing session-based > connections that results in leaking file descriptors for event loops - > eventually leading to an `OSError [Errno 24] too many open files` error after > enough transactions occur. > The problem stems from a race condition when closing session based > connections that causes the event loop opened for the session's connection to > be left open. > The problem is completely contained in these two methods from > `gremlin_python.driver.client.py` > ```py > def close(self): > # prevent the Client from being closed more than once. it raises errors > if new jobby jobs > # get submitted to the executor when it is shutdown > if self._closed: > return > if self._session_enabled: > self._close_session() # 1. (see below) > log.info("Closing Client with url '%s'", self._url) > while not self._pool.empty(): # 3. (see below) > conn = self._pool.get(True) > conn.close() > self._executor.shutdown() > self._closed = True > def _close_session(self): > message = request.RequestMessage( > processor='session', op='close', > args=\{'session': str(self._session)}) > conn = self._pool.get(True) > return conn.write(message).result() # 2. (see below) > ``` > 1. `_close_session()` called > 2. `.result()` waits for the _write_ to finish, but does *not* wait for the > _read_ to finish. `conn` does not get put back into `self._pool` until AFTER > the read finishes (`gremlin_python.driver.connection.Connection._receive()`). > However, this method returns early and goes to 3. > 3. this while loop is not entered to close out the connections. This leaves > the conn's event loop running, never to be closed. > I was able to solve this by modifying `_close_session` as follows: > ```py > def _close_session(self): > message = request.RequestMessage( > processor='session', op='close', > args=\{'session': str(self._session)}) > conn = self._pool.get(True) > try: > write_result_set = conn.write(message).result() > return write_result_set.all().result() # wait for _receive() to finish > except protocol.GremlinServerError: > pass > ``` > I'm not sure if this is the correct solution, but wanted to point out the bug. > In the meantime however, I wrote a context manager to handle this cleanup for > me > ```py > @contextlib.contextmanager > def transaction(): > tx = g.tx() > gtx = tx.begin() > try: > yield tx, gtx > tx.commit() > except Exception as e: > tx.rollback() > finally: > while not tx._session_based_connection._client._pool.empty(): > conn = tx._session_based_connection._client._pool.get(True) > conn.close() > logger.info("Closed abandoned session connection") > with transaction() as (tx, gtx): > foo = gtx.some_traversal().to_list() > # do something with foo > gtx.some_other_traversal().iterate() > ``` > Cheers -- This message was sent by Atlassian Jira (v8.20.10#820010)