2009/7/7 Matt Goodall <[email protected]>:
> Splitting the discussion of line breaks in the _changes document into
> separate email thread ...
>
> 2009/7/6 Chris Anderson <[email protected]>:
>> On Mon, Jul 6, 2009 at 5:50 AM, Matt Goodall<[email protected]> wrote:
>
>>> == Line Breaks ==
>>>
>>> If each results item is sent with its ending newline (the "," is sent
>>> with the next item) it would make clients much easier and correct to
>>> write, i.e. buffer bytes until a newline is received, split the
>>> buffer, process the row, repeat. You've still got to remove the ","
>>> from all but the first line but it's in a predictable place. Actually,
>>> I don't believe TCP provides any guarantees that bytes sent are
>>> received in the same chunks so relying on anything other than the
>>> newline is probably flawed.
>>>
>>> It's a trivial change, patch attached.
>>
>> There's a certain elegance to the current system. So far I've been
>> testing in the browser and it works fine. If there's demonstrated
>> problems for a client then we shouldn't hesitate to change it.
>
> Agreed, a comma at the end of the line is much prettier.
>
> The 'changes' tests are very unlikely to highlight any problem because
> there's such a small amount of data being sent (well below the MTU of
> the network device) and a sleep(100) is almost certainly enough to
> allow the data to arrive in the browser. If the tests caused lots of
> data to be sent and the browser was listening for data using a
> onreadystatechange callback we may be lucky enough to hit the problem.
> However, "almost" and "may" are not good words when it comes to
> testing ;-).
>
> Anyway, from experience I believe the only way to prove this is to
> explicitly have the bytes arrive slowly so, when I get a couple of
> minutes, I'll write something simple that will hopefully demonstrate
> the value of the newline terminator.
Attached is a quick and dirty Python script with two versions of
handling a continuous _changes stream:
* changes_comma_eol works with CouchDB trunk.
* changes_eol_comma works with a patched CouchDB.
I really haven't exactly tested them extensively but I think both are
correct although I wouldn't be surprised if there are some edge cases
I've missed, especially in the changes_comma_eol version. I think it's
reasonably clear which version is simpler, and therefore less error
prone, for a client to implement
There are definitely some things that could be done to improve the
comma_eol version a little but I wanted to keep the code as simple as
possible and I don't think there's any way of completely avoiding some
unnecessary JSON parsing.
Hope that's useful. I'll create a ticket with the patch and the
example so it doesn't get lost.
- Matt
import httplib
import json
def GET(db):
conn = httplib.HTTPConnection('localhost', 5984)
conn.request('GET', '/%s/_changes?continuous=true&timeout=2000'%db)
response = conn.getresponse()
def gen_data():
while True:
data = response.read(1)
if not data:
break
yield data
conn.close()
return response.getheaders(), gen_data()
def changes_eol_comma(db, callback):
headers, stream = GET(db)
buffer = ''
for data in stream:
buffer += data
lines = buffer.split('\n')
# Iterate only the lines we know are complete.
for line in lines[:-1]:
# Skip uninteresting lines.
if line == '{"results":[' or line == '],' or line == '':
continue
# Handle "last_seq" line to get final seq.
if line.startswith('"last_seq"'):
return int(line[11:-1])
# We have a changes row left. It may include a leading comma.
if line[0] == ',':
line = line[1:]
callback(json.loads(line))
# Buffer remaining bytes.
buffer = lines[-1]
def changes_comma_eol(db, callback):
headers, stream = GET(db)
buffer = ''
for data in stream:
buffer += data
lines = buffer.split('\n')
# Iterate all lines even though the last one may not be complete yet.
for line in lines:
# Skip uninteresting lines.
if line == '{"results":[' or line == '],' or line == '':
continue
# Handle "last_seq" line to get final seq.
if line.startswith('"last_seq"'):
# But only if we've received the whole line.
if not line[-1] == '}':
buffer = line
continue
return int(line[11:-1])
# The line is now either a full change row or part of a line that
# makes up some part of the document but is not enough to identify
# it yet.
try:
# There may be a leading or trailing comma on the line
# depending on whether a previous line's ending comma had
# arrrived when the line was parsed. Just in case, strip from
# both end.
callback(json.loads(line.strip(',')))
except ValueError:
# Couldn't parse it so leave until we have more bytes.
buffer = line
else:
buffer = ''
def changed(row):
print "** change:", row
# Uncomment the version that matches CouchDB.
#last_seq = changes_eol_comma('test', changed)
last_seq = changes_comma_eol('test', changed)
print "** last_seq:", last_seq