ashb commented on code in PR #29625: URL: https://github.com/apache/airflow/pull/29625#discussion_r1111824731
########## airflow/utils/entry_points.py: ########## @@ -28,26 +30,33 @@ log = logging.getLogger(__name__) +EPnD = Tuple[metadata.EntryPoint, metadata.Distribution] -def entry_points_with_dist(group: str) -> Iterator[tuple[metadata.EntryPoint, metadata.Distribution]]: - """Retrieve entry points of the given group. - - This is like the ``entry_points()`` function from importlib.metadata, - except it also returns the distribution the entry_point was loaded from. - :param group: Filter results to only this entrypoint group - :return: Generator of (EntryPoint, Distribution) objects for the specified groups - """ [email protected]_cache(maxsize=None) +def _get_grouped_entry_points() -> dict[str, list[EPnD]]: loaded: set[str] = set() + mapping: dict[str, list[EPnD]] = collections.defaultdict(list) for dist in metadata.distributions(): try: key = canonicalize_name(dist.metadata["Name"]) Review Comment: I was profling memory usage (not speed), and this seemed to be a cause of a lot of bloat -- and it's suprisingly expensive to get the dist name (involves parsing a lot of files for _every_ dist in airflow). I think there are three options here: 1. Only compute this key if the entrypoint group matches (this limits the expensive operation to just dists we actually care about, instead of _all_ 2. Use the path component to be the cache key (see below) 3. Cache based on the entrypoint classpath 4. Don't cache at all. This only catches a case when you have multiple copies of the same dist in multiple cases (which is _rare_ to hit outside of being an Airflow developer anyway). On point 2, it doesn't seem possible to do this using public methods, but this: ```python console In [14]: d._path Out[14]: PosixPath('/home/ash/airflow/.venv/lib/python3.11/site-packages/greenlet-2.0.2.dist-info') In [15]: d._path.stem Out[15]: 'greenlet-2.0.2' ``` Doing this might break for less common ways of shipping dists though, so it's likely not a good option, not without a fallback anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
