fapifta commented on PR #7282: URL: https://github.com/apache/ozone/pull/7282#issuecomment-2423483151
Let me add some background on why I proposed to @len548 to create this tool. In freon we have the possibility to generate key ranges, we also have the possibility to generate range keys and the read/list those ranges within the same workload. These are good tools, but they are working in a predictable manner on key ranges. We are curious to see how Ozone behaves and utilizes memory and also we are interesting in how Ozone behaves if there are frequent cache misses on the RocksDB level or the on heap partial cache for key related data. (afaik we have some caching on heap for keys also, let me know if I am wrong) This is not something you can achieve with predictable key ranges. So this tool is the reader counterpart of the random key generator, and reads random keys. If the two are used together, then the results are predictable, as the random key generator generates keys of the same size, and if we read any of the keys we read the same amount of data from a datanode, while if we read it with this level of randomness, then my prediction is that (given enough keys), there will be a lot of cache misses within OM, and within the DNs we also do not read data that were written at around the same time, so OS caches for I/O will also have misses. I believe this gives us more insight into a real workload where potentially a lot of clients read different data at scale, and will be able to give us insight into what happens with memory consumption and performance in a highly utilized cluster where potentially any data can be read by multiple clients at the same time. Please correct me if I am wrong, but as I understand, with other existing tools this level of read unpredictability can not be reached. On a side note, this tool is not really give any insight to anything if the key sizes are different, so it is not meaningful to use this on a real cluster with varying file sizes in a real environment. So @adoroszlai I completely agree with you, this tool is not producing useful info in a real env, but combined with random key generator I believe the unpredictability of the results is something that gives insight and useful. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
