Hi folks! I've had to work through the steps to do various cluster operations in the face of a table with a couple of million mob hfiles enough times lately that I took the time to write down both a) how to stand up an approximation elsewhere to get reproduction and timing information, and b) examples of step-by-step with how to track how things are going.
To be clear, it's still a draft. It's also currently markdown. If it helps to see the specifics of what I'm talking about, it's here: https://gist.github.com/busbey/5ff88e31705e52a392392b4fb2eadac2 I'd like to get this kind of stuff folded into our community docs somewhere. But I'm not sure where something like this would fit. I don't think it works for the current one-ref-guide-to-rule-them-all. What do y'all think?