ashb commented on code in PR #29625:
URL: https://github.com/apache/airflow/pull/29625#discussion_r1111824731


##########
airflow/utils/entry_points.py:
##########
@@ -28,26 +30,33 @@
 
 log = logging.getLogger(__name__)
 
+EPnD = Tuple[metadata.EntryPoint, metadata.Distribution]
 
-def entry_points_with_dist(group: str) -> Iterator[tuple[metadata.EntryPoint, 
metadata.Distribution]]:
-    """Retrieve entry points of the given group.
-
-    This is like the ``entry_points()`` function from importlib.metadata,
-    except it also returns the distribution the entry_point was loaded from.
 
-    :param group: Filter results to only this entrypoint group
-    :return: Generator of (EntryPoint, Distribution) objects for the specified 
groups
-    """
[email protected]_cache(maxsize=None)
+def _get_grouped_entry_points() -> dict[str, list[EPnD]]:
     loaded: set[str] = set()
+    mapping: dict[str, list[EPnD]] = collections.defaultdict(list)
     for dist in metadata.distributions():
         try:
             key = canonicalize_name(dist.metadata["Name"])

Review Comment:
   I was profling memory usage (not speed), and this seemed to be a cause of a 
lot of bloat -- and it's suprisingly expensive to get the dist name (involves 
parsing a lot of files for _every_ dist in airflow).
   
   I think there are three options here:
   
   1. Only compute this key if the entrypoint group matches (this limits the 
expensive operation to just dists we actually care about, instead of _all_
   2. Use the path component to be the cache key (see below)
   3. Cache based on the entrypoint classpath
   4. Don't cache at all. This only catches a case when you have multiple 
copies of the same dist in multiple cases (which is _rare_ to hit outside of 
being an Airflow developer anyway).
   
   On point 2, it doesn't seem possible to do this using public methods, but 
this:
   
   ```python console
   In [14]: d._path
   Out[14]: 
PosixPath('/home/ash/airflow/.venv/lib/python3.11/site-packages/greenlet-2.0.2.dist-info')
   
   In [15]: d._path.stem
   Out[15]: 'greenlet-2.0.2'
   ```
   
   Doing this might break for less common ways of shipping dists though, so 
it's likely not a good option, not without a fallback anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to