Strikerrx01 commented on code in PR #34135:
URL: https://github.com/apache/beam/pull/34135#discussion_r1979589935


##########
sdks/python/apache_beam/io/gcp/bigquery_tools.py:
##########
@@ -386,8 +390,28 @@ def __init__(self, client=None, temp_dataset_id=None, 
temp_table_ref=None):
       self._temporary_table_suffix = uuid.uuid4().hex
       self.temp_dataset_id = temp_dataset_id or self._get_temp_dataset()
 
+    # Initialize table definition cache with default TTL of 1 hour
+    # Cache entries are invalidated after TTL expires to ensure fresh metadata
+    self._table_cache = {}

Review Comment:
   @sjvanrossum Thanks for the guidance. I'll evaluate the available caching 
packages carefully:
   
   1. For functools:
   - `@functools.lru_cache` - Thread-safe but no TTL support
   - `@functools.cache` - Simple but no size limits or TTL
   
   2. For cachetools:
   - `TTLCache` - Has TTL but uses simple lock
   - `LRUCache` - Good size management but no TTL
   - `cached` decorator - Combines features but may have lock contention
   
   3. Other options:
   - `fastcache` - C implementation, very fast but less flexible
   - `pylru` - Pure Python, good for LRU but no TTL
   
   Since we need:
   - TTL for quick schema propagation (1s)
   - Thread safety for hundreds of concurrent threads
   - Size limits to prevent memory issues
   - Good lock scaling for high concurrency
   
   I'll do some performance testing with different options focusing on lock 
contention under high thread counts. Would you like to see the comparison 
results before I choose an implementation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to