fapifta commented on PR #7282:
URL: https://github.com/apache/ozone/pull/7282#issuecomment-2423483151

   Let me add some background on why I proposed to @len548 to create this tool.
   
   In freon we have the possibility to generate key ranges, we also have the 
possibility to generate range keys and the read/list those ranges within the 
same workload. These are good tools, but they are working in a predictable 
manner on key ranges.
   
   We are curious to see how Ozone behaves and utilizes memory and also we are 
interesting in how Ozone behaves if there are frequent cache misses on the 
RocksDB level or the on heap partial cache for key related data. (afaik we have 
some caching on heap for keys also, let me know if I am wrong)
   
   This is not something you can achieve with predictable key ranges.
   
   So this tool is the reader counterpart of the random key generator, and 
reads random keys. If the two are used together, then the results are 
predictable, as the random key generator generates keys of the same size, and 
if we read any of the keys we read the same amount of data from a datanode, 
while if we read it with this level of randomness, then my prediction is that 
(given enough keys), there will be a lot of cache misses within OM, and within 
the DNs we also do not read data that were written at around the same time, so 
OS caches for I/O will also have misses.
   
   I believe this gives us more insight into a real workload where potentially 
a lot of clients read different data at scale, and will be able to give us 
insight into what happens with memory consumption and performance in a highly 
utilized cluster where potentially any data can be read by multiple clients at 
the same time.
   
   Please correct me if I am wrong, but as I understand, with other existing 
tools this level of read unpredictability can not be reached.
   
   On a side note, this tool is not really give any insight to anything if the 
key sizes are different, so it is not meaningful to use this on a real cluster 
with varying file sizes in a real environment. So @adoroszlai I completely 
agree with you, this tool is not producing useful info in a real env, but 
combined with random key generator I believe the unpredictability of the 
results is something that gives insight and useful. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to