Re: [PR] [Python]Enable state cache to 100 MB [beam]

via GitHub Mon, 30 Oct 2023 10:59:58 -0700


tvalentyn commented on code in PR #28781:
URL: https://github.com/apache/beam/pull/28781#discussion_r1376618265



##########
sdks/python/apache_beam/runners/worker/sdk_worker_main.py:
##########
@@ -241,27 +242,26 @@ def _parse_pipeline_options(options_json):
   return PipelineOptions.from_dictionary(_load_pipeline_options(options_json))
 
 
-def _get_state_cache_size(options, experiments):
-  """Defines the maximum size of the cache in megabytes.
+def _get_state_cache_size_bytes(options):
+  """Return the maximun size of state cache in bytes.

Review Comment:
   there is an inconsistency b/w bytes and megabytes. 
   nit: typo



##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -1128,15 +1128,16 @@ def _add_argparse_args(cls, parser):
         type=str,
         help='GCE minimum CPU platform. Default is determined by GCP.')
     parser.add_argument(
-        '--state_cache_size',
-        '--state_cache_size_mb',
-        dest='state_cache_size',
+        '--max_cache_memory_usage_mb',
+        dest='max_cache_memory_usage_mb',
         type=int,
         default=None,

Review Comment:
   Any concerns to define the 100mb default here?



##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -1128,15 +1128,16 @@ def _add_argparse_args(cls, parser):
         type=str,
         help='GCE minimum CPU platform. Default is determined by GCP.')
     parser.add_argument(
-        '--state_cache_size',
-        '--state_cache_size_mb',
-        dest='state_cache_size',
+        '--max_cache_memory_usage_mb',
+        dest='max_cache_memory_usage_mb',
         type=int,
         default=None,
         help=(
-            'Size of the state cache in MB. Default is 100MB.'
-            'State cache is per process and is shared between all threads '
-            'within the process.'))
+            'Size of the SdkHarness/Sdk Process cache in MB. Default is 100MB.'
+            'This cache is used to store the user state and side input '
+            'elements. If the cache is full, the least recently used '
+            'elements will be evicted. This cache will be per SdkHarness/Sdk '
+            'Process. SDKHarness is a python process that runs the user 
code.'))

Review Comment:
   ```suggestion
               'Process. Depending on the runner, there may be more than 1 
process running on the same worker node.'))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Python]Enable state cache to 100 MB [beam]

Reply via email to