davisp commented on PR #4264: URL: https://github.com/apache/couchdb/pull/4264#issuecomment-1313966490
I think this is papering over a different bug/misbehavior. I'm not sure if we documented it as such, but "active size" should be a rough representation of how many bytes are reachable from the root nodes in the header. The idea there being that active size would be a rough approximation of the size of the database just after compaction finishes. Going back to the original issue of "deleting gigabytes of data and smoosh not noticing" my wild guess is that given small enough individual documents, coupled with a randomish (with respect to doc_id or update_seq order) could lead to enough "active garbage inner nodes" that smoosh gets confused. Also, how much data are we missing here? And/or can we generate a test case that demonstrates the behavior? If anyone has run this in production, I'd be curious to see how many compactions are running on the regular, along with the expected vs actual post compaction file sizes. For instance, in the "smoosh didn't notice and perform compaction", did there end up being a massive file size shrinkage? Or is this is "huh, weird that there's no compaction triggered" issue. Given that deletions are only removing doc bodies and the trees themselves aren't changing shape maybe its just unexpected overhead of the trees being swamped in the smoosh calculations? Basically, it does sound like there *could* be an issue here, but I doubt that this is the correct fix. Unless I'm forgetting something crazy big, this basically seems like a "external size after compression applied" calculation which might be useful, but is definitely different than what the original intent was. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
