Re: Attachment Replication Problem - Bug Found

Adam Kocoloski Sat, 16 May 2009 08:07:52 -0700

Hi Antony,

On May 16, 2009, at 10:39 AM, Antony Blakey wrote:

I can confirm that the target and source of replicated resourcesaffected by this issue are identical with this fix, and both arecorrect i.e. uncorrupted, although this is only according to thefailures I've seen.


Thanks!  Makes me feel better, at least.

Now, on to the checkpointing conditions. I think there's someconfusion about the attachment workflow. Attachments aredownloaded _immediately_ and in their entirety by ibrowse, whichthen sends the data as 1MB binary chunks to the attachment receiverprocesses.
Are they downloaded to disk by ibrowse?

No, I don't believe so. ibrowse accepts a {stream_to, pid()} option.It accumulates packets until it reaches a threshold configurable by{stream_chunk_size, integer()} (default 1MB), then sends the data tothe Pid. I don't think ibrowse is writing to disk at any point inthe process. We do see that when streaming really large attachments,ibrowse becomes the biggest memory user in the emulator.

ibrowse does offer a {save_response_to_file, boolean()|filename()}option that we could possibly leverage.

In another thread Matt Goodall suggested checkpointing after acertain amount of time has passed. So we'd have a checkpointingalgo that considers
* memory utilization
* number of pending writes
* time elapsed
That seems to cover both resource usage and incremental progress. Asfar as the couch_util:should_flush mechanism is concerned, I think agood idea would be to commit 1 document, then 2, then 4 i.e. abinary increasing window which adapts well to both unreliable andreliable connections without requiring configuration, which istricky because you may want to run the system in a variety ofscenarios, and you might not know what the failure characteristicsare (and they may change over time).

It sounds like a good idea. I had thought about doing the same forthe process that pulls new docs from the source server, so that wecould do a better job of filling up the pipes when we're dealing withthe common case of small documents without significant attachment data.

While we on this - any idea about why couchdb is quiting duringreplication? It's not giving me any errors.

Errm, no, I'm afraid I don't have any idea there. I remember one ortwo other reports in JIRA that sounds similar, but I've not been ableto reproduce them. Are you keeping an eye on the memory usage? Ithink an out of memory error can trigger this sudden death in Erlang.Sorry, that's the best I've got at the moment.


Adam

Re: Attachment Replication Problem - Bug Found

Reply via email to