Strikerrx01 commented on code in PR #34135: URL: https://github.com/apache/beam/pull/34135#discussion_r1979589935
########## sdks/python/apache_beam/io/gcp/bigquery_tools.py: ########## @@ -386,8 +390,28 @@ def __init__(self, client=None, temp_dataset_id=None, temp_table_ref=None): self._temporary_table_suffix = uuid.uuid4().hex self.temp_dataset_id = temp_dataset_id or self._get_temp_dataset() + # Initialize table definition cache with default TTL of 1 hour + # Cache entries are invalidated after TTL expires to ensure fresh metadata + self._table_cache = {} Review Comment: @sjvanrossum Thanks for the guidance. I'll evaluate the available caching packages carefully: 1. For functools: - `@functools.lru_cache` - Thread-safe but no TTL support - `@functools.cache` - Simple but no size limits or TTL 2. For cachetools: - `TTLCache` - Has TTL but uses simple lock - `LRUCache` - Good size management but no TTL - `cached` decorator - Combines features but may have lock contention 3. Other options: - `fastcache` - C implementation, very fast but less flexible - `pylru` - Pure Python, good for LRU but no TTL Since we need: - TTL for quick schema propagation (1s) - Thread safety for hundreds of concurrent threads - Size limits to prevent memory issues - Good lock scaling for high concurrency Do you have any recommendations on which package would be most suitable for our use case? I'm particularly interested in your thoughts on lock scaling with high thread counts, since you mentioned this cache could be accessed by hundreds of threads. I can also do some performance testing with different options focusing on lock contention under high thread counts if that would be helpful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org