[
https://issues.apache.org/jira/browse/COUCHDB-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695890#action_12695890
]
Robert Newson commented on COUCHDB-220:
---------------------------------------
It appears that the .couch file is extended by 64k every time a document is
added (regardless of whether the document is a few hundred bytes).
Chatting with davisp, transcript below;
(
18:27:14) davisp: got that test handy so you can run it after a slight tweak to
couchdb?
(18:27:46) rnewson: the sparseness one? yep.
(18:27:53) davisp: rnewson: line 41 in couchdb_stream.erl
(18:28:11) davisp: Try changing that from 16#000010000 to 1
(18:28:39) rnewson: min_alloc, yes?
(18:28:39) davisp: Not sure if that'll break things or not
(18:28:44) rnewson: we'll soon know.
(18:28:49) davisp: But I ran across it when reading
(18:28:55) davisp: rnewson: yep on min alloc
(18:29:12) rnewson: yes, that did it.
...
(18:34:14) rnewson: davisp: I'm glad you did, the difference is dramatic, I'd
say this is the cause of the behavior I see.
(18:34:36) davisp: It could be that couch_stream has a bug that's preventing it
from using leftover space
(18:34:43) rnewson: davisp: As I said, I actually hit the ext3 max-file-size
with this problem.
(18:35:10) davisp: Ie, The 65K is intendeded to be used by multiple documents,
but book keeping is saying to constantly create new buffers
...
(19:01:00) davisp: rnewson: It just looks like the buffer state for.... oh dear
god
(19:01:11) vmx: davisp: yes i get the idea, and the final output (e.g. in a
browsers) seems to be right, but the internal representation seems a bit
confusing
(19:01:20) rnewson: davisp: epiphany?
(19:01:49) davisp: rnewson: I wonder if its only holding buffer state for the
durating of a single request. Try adding two attachments with the same data
...
(19:10:23) davisp: It looks like a consequence of the necessary code for
streaming files that didn't specify a content-length
(19:10:45) davisp: rnewson: Looks like ensure_buffer needs a flag
(19:13:01) davisp: rnewson: My guess is that you'd want to add a flag in the
accumulator on the PreAllocSize fold function that says if you have touched the
clause that has an unknown length
(19:13:21) davisp: then pass that flag to ensure_buffer and if the flag is true
in ensure_buffer you allocate exactly the specified size.
(19:13:30) davisp: instead of the MinSize bit
(19:13:49) rnewson: makes sense.
> Extreme sparseness in couch files
> ---------------------------------
>
> Key: COUCHDB-220
> URL: https://issues.apache.org/jira/browse/COUCHDB-220
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 0.9
> Environment: ubuntu 8.10 64-bit, ext3
> Reporter: Robert Newson
>
> When adding ten thousand documents, each with a small attachment, the
> discrepancy between reported file size and actual file size becomes huge;
> ls -lh shard0.couch
> 698M 2009-01-23 13:42 shard0.couch
> du -sh shard0.couch
> 57M shard0.couch
> On filesystems that do not support write holes, this will cause an order of
> magnitude more I/O.
> I think it was introduced by the streaming attachment patch as each
> attachment is followed by huge swathes of zeroes when viewed with 'hd -v'.
> Compacting this database reduced it to 7.8mb, indicating other sparseness
> besides attachments.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.