sodonnel commented on pull request #3254: URL: https://github.com/apache/ozone/pull/3254#issuecomment-1084726729
I read through the doc and the change here, and I am not sure tracking space is the correct way to solve this. I know that EC files should be large, but they will not always be. Also a large cluster does not always need a lot of EC pipelines open. Sometimes, the main load may be from reads, or there is some rarely used EC policy (eg a tiny number of writes are using EC-3-2). That is why I think the "block allocation rate" is the best way to gauge the number of pipelines we need. If we start with some sensible, but configurable minimum and some upper bound based on the registered nodes. Then we keep track of the block allocation requests per time period in the ECWritableContainerProvider and per EC policy. We can guess the time it takes for a client to write a full block - it will always be approximate. We don't know how much of the block will be filled, or if the writer is a slow writer streaming events, or a fast writer. We know the max MB they can write, as it will be the `blockSize * Required_Nodes_For_EC_Policy`, for 6-3, that will be `256 * 9 = 2304MB`. The data is written mostly serially, so guess 150MB/s, it will take about 15 seconds to write that block. We we scale that number back by some factor as not all blocks will be filled. Eg assume it is 50% of that. If we are seeing 10 block requests per second for an EC policy, and it takes 15 seconds to write the full block, perhaps we need 10 * 15 = 150 pipelines, or we can scale that by `block_fill_factor`. If the load drops to 1 request per second, we only need 10. The other thing we need to consider, is that Ratis pipelines can have many open containers on a single pipeline, and each container is constrained to a single disk. An EC pipleline only has a single container and hence a single disk on a DN. So we need to consider the number of disks on the DNs as well as the number of nodes. I am not sure how fine grained we would need to track the request rate, eg per second, per 10 seconds, per minute. Or should we have something like the Linux top command were it has the 1, 5 and 15 minute average, and if we did have that, how would we use it? I feel the existing close logic should handle containers filling OK without having to worry about it in the WriteableContainer provider. The DN triggers the close at some percentage full, expecting more blocks will continue to be written. For EC containers the problem is even less as the blocks are spread across the replicas more than with Ratis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
