Re: replication using _changes API

Adam Kocoloski Fri, 12 Jun 2009 08:01:11 -0700

On Jun 12, 2009, at 10:47 AM, Damien Katz wrote:

On Jun 12, 2009, at 8:59 AM, Adam Kocoloski wrote:
Hi Damien, I'm not sure I follow. My worry was that, if I built areplicator which only queried _changes to get the list of updates,I'd have to be prepared to process a very large response. Ithought one smart way to process this response was to throttle thedownload at the TCP level by putting the socket into passive mode.
You will have a very large response, but you can stream it,processing one line at a time, then you discard the line and processthe next. As long as the writer is using a blocking socket and thereader is only reading as much data as necessary to process a line,you never need to store much of the data in memory on either side.But it seems the HTTP client is buffering the data as it comes in,perhaps unintentionally.
With TCP, the sending side will only send so much data beforegetting an ACK, acknowledgment that packets sent were actuallyreceived. When an ACK isn't received, the sender stops sending, andthe TCP calls will block at the sender (or return an error if thesocket is in non-blocking mode), until it gets a response or sockettimeout.
So if you have a non-buffering reader and a blocking sender, thenyou can stream the data and only relatively small amounts of dataare buffered at any time. The problem is the reader in the HTTPclient isn't waiting for the data to be demanded at all, instead assoon as data comes in, it sends it to a receiving erlang process.Erlang processes never block to receive messages, so there is nolimit to the amount of data buffered. So if the Erlang process can'tprocess the data fast enough, it starts getting buffered in it'smailbox, consuming unlimited memory.
Assuming I understand the problem correctly, the way to fix it is tohave the HTTP client not read the data until it's demanded by theconsuming process. Then we are only using the default TCP buffers,not the Erlang message queues as a buffer, and the total amount ofmemory used at anytime is small.
-Damien

Yes, we're definitely in agreement, just using different language. Iwas trying to do exactly what you describe, but the HTTP client I wasusing seemed to be ignoring my request to switch the socket to passivemode (i.e. turn off buffering). Best,


Adam

Re: replication using _changes API

Reply via email to