Github user davisp commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/24#issuecomment-67239293
  
    That looks pretty good but there's still an issue with the counting of 
active size. Unfortunately I think the fix is going to require us to start 
using @strmpnk's new attachments module which appears to introduce some new 
issues once we start using extended attributes.
    
    The issue is that when we update documents we don't know if the doc on disk 
is de-duplicated or not. So if we go to edit a doc that has a de-duplciated 
attachment will suddenly start double counting the bytes on disk which breaks 
our active size information. A test case would be something like this:
    
        1. Create doc A with attachment
        2. Create doc B with same attachment that will be de-duped
        3. Compact the database
        4. Check our active size
        5. Modify doc B (in a way that keeps the de-duped attachment without 
copying it)
        6. Check that active size isn't increased by more than the attachment 
size
    
    I'd suggest making sure that the attachment is something like 64K so that 
its obvious if we accidentally started double counting (modifying doc B should 
increase active size slightly by definition).
    
    The fix here I think is to use the couch_att module's ability to store 
extended attributes that shows if the attachment was deduplicated. 
Theoretically wrapping up some of the compaction code will clean up that logic 
anyway so that's a good thing.
    
    Unfortunately it looks like we haven't yet started accounting for 
attachments with extended attributes in the compactor. Our current attachment 
terms are those unwiedly 8-tuples. Anything with extended attributes will turn 
that into a 2-tuple with the first element being the 8-tuple, and the second 
element being a proplist of key/value pairs for the attachment. Theoretically 
we should be able to use this to store if the attachment was de-duplicated at 
compaction time but we'll have to make sure it works with compactions and 
upgrades which it appears hasn't been completely vetted yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to