potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911812067


   Ah right. The last line you wrote is GOLD.
   
   That probably would explain it and it's NOT AN ISSUE.
   
   When you open many files Linux basically will use as much memory it can for 
file caches. Whenever you read or write a file, the blocks of disk are kept 
also in memory just in case the files needs to be accessed by any process. It 
also marks them dirty in case the blocks change and evicts such dirty blocks 
from memory. Also when some process needs  more memory than it has available, 
it will evict some unused pages from memory to free them. Basically for any 
system, that writes files to logs continuously and the logs are not modified 
later, the cache memory will grow CONTINUOUSLY until the limit set  by kernel 
configuration.
   
   So depending on what your Kernel configuration is (basically the Kernel of 
your Kubernetes Virtual machines under the hood), you will see the metrics 
growing continuously (up to the kernel limit). You can limit the memory 
available to your Scheduler container  to limit it "per container" (via giving 
it less memory resources) but basically as much memory you give to the 
scheduler container, it will be used for cache after some time (and will not be 
explicitly freed -  but it's not a problem because the memory is effectively 
"free" - it's just used for cache and it can be freed immediately when needed).
   
   That would PERFECTLY explain why the memory drops immediately after the 
files are deleted - those files are deleted so the cache for those files should 
also get deleted by the system immediately. 
   
   Instead of looking at total memory used you should look at the 
**container_memory_working_set_bytes** - metrics. It reflects the actually 
"actively used" memory.  You can read more here: 
https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e66
   
   You can also test it by running (from 
https://linuxhint.com/clear_cache_linux/):
   
   `echo 1 > /proc/sys/vm/drop_caches`
   
   In the container. This should drop your caches immediately without deleting 
the files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to