davisp commented on PR #4264:
URL: https://github.com/apache/couchdb/pull/4264#issuecomment-1313966490

   I think this is papering over a different bug/misbehavior. I'm not sure if 
we documented it as such, but "active size" should be a rough representation of 
how many bytes are reachable from the root nodes in the header. The idea there 
being that active size would be a rough approximation of the size of the 
database just after compaction finishes.
   
   Going back to the original issue of "deleting gigabytes of data and smoosh 
not noticing" my wild guess is that given small enough individual documents, 
coupled with a randomish (with respect to doc_id or update_seq order) could 
lead to enough "active garbage inner nodes" that smoosh gets confused.
   
   Also, how much data are we missing here? And/or can we generate a test case 
that demonstrates the behavior? If anyone has run this in production, I'd be 
curious to see how many compactions are running on the regular, along with the 
expected vs actual post compaction file sizes.
   
   For instance, in the "smoosh didn't notice and perform compaction", did 
there end up being a massive file size shrinkage? Or is this is "huh, weird 
that there's no compaction triggered" issue. Given that deletions are only 
removing doc bodies and the trees themselves aren't changing shape maybe its 
just unexpected overhead of the trees being swamped in the smoosh calculations?
   
   Basically, it does sound like there *could* be an issue here, but I doubt 
that this is the correct fix. Unless I'm forgetting something crazy big, this 
basically seems like a "external size after compression applied" calculation 
which might be useful, but is definitely different than what the original 
intent was.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to