Re: Feed Paging for Efficient Feed Synchronization

James M Snell Thu, 06 Dec 2007 11:37:30 -0800

A quick note on the efficiency issue.  I just ran a quick test.  I
created 20 feed pages, each with 20 entries each.  Using the feed paging
model, I grab the first feed, scan the entries to see if they have been
deleted or updated.  Deleted entries are represented by an x:deleted
element *within the entry*.  Then I grab the next page[1], scan the
entries, grab the next, scan the entries, etc until I come to an entry
whose app:edited timestamp is earlier than the cutoff I defined (which
assumes the entries are sorted according to app:edited, which is not
guaranteed)[2].  I ran the process 100 times, throwing away the first
run because of initialization overhead.  The times (ms) for each run are:


[1681, 6114, 2049, 1923, 1657, 1596, 1634, 1676, 1630, 1635, 1790, 1610,
1900, 1639, 1604, 1809, 1763, 1775, 2040, 8441, 2126, 2229, 1713, 1687,
1812, 2012, 1722, 1770, 1734, 1683, 1625, 1613, 1622, 1617, 2130, 1687,
1675, 1867, 7261, 2006, 1661, 1610, 1662, 1584, 1686, 1753, 1632, 1629,
1808, 1691, 1631, 1756, 7585, 1660, 1638, 2671, 1702, 1872, 1593, 1895,
1606, 1692, 1647, 1987, 1750, 1880, 1708, 1793, 2160, 1654, 1854, 1931,
1613, 3006, 2041, 2319, 2493, 1654, 1713, 1619, 1588, 1765, 2046, 1579,
2308, 1940, 1820, 1996, 1655, 1603, 1863, 1723, 1628, 1575, 1645, 1568,
1860, 1700, 1864]

[1]: http pipelining is enabled.
[2]: If entries are not sorted by app:edited, things get much worse.

By comparison, I created a single snapshot feed containing exactly the
same set of modified entries.  Deleted entries are represented using an
x:deleted tombstone element rather than atom:entry elements.  The feed
ONLY contains x:deleted and atom:entry elements representative of items
updated after the cutoff I selected.  I ran the process 100 times,
throwing away the first run because of initialization overhead.  The
times (ms) for each run are:

[225, 253, 415, 208, 155, 95, 104, 66, 106, 74, 102, 76, 70, 113, 64,
68, 73, 90, 64, 101, 427, 181, 177, 58, 58, 947, 50, 46, 48, 70, 83, 89,
41, 143, 74, 101, 39, 59, 38, 41, 37, 45, 269, 40, 403, 37, 38, 37, 62,
38, 36, 64, 36, 37, 57, 36, 37, 70, 36, 38, 84, 41, 34, 50, 33, 33, 53,
130, 107, 132, 55, 37, 53, 38, 39, 88, 39, 39, 168, 95, 73, 102, 62, 48,
84, 47, 45, 133, 43, 81, 367, 55, 50, 89, 45, 67, 47, 67, 57]

The test feeds and code are located at
http://www.snellspace.com/public/feeds

- James

Brian Smith wrote:
> James M Snell wrote:
>> Brian Smith wrote:
>>> James Snell wrote at http://www.snellspace.com/wp/?p=818:
>>>> And Because the entire snapshot is contained within a single feed 
>>>> document, we do not have to worry about the race condition that is 
>>>> inherent in the rolling window feed paging model.
>>> The race condition is not an inherent problem in the paging 
>>> model; it is an implementation issue. In my implementation, I 
>>> do all the paging in my collection feed using links like:
>> Note that I specifically indicated the rolling window feed 
>> paging model.
>>  Yes, there are ways of doing the paging where the pages are 
>> stable, and if we can rely on folks doing it that way 
>> consistently then paging is fine.  The key requirement is 
>> that the snapshot has to remain stable throughout the sync process.
> 
> To be clear, I don't provide any kind of snapshot mechanism. The client needs 
> to go back to the start of the feeds (most recently edited entries) after it 
> has paged through all the older entries, to see if some older entry has been 
> modified during the paging process. If some entry or entries are constantly 
> being edited, then it is possible that the client will never see them. On the 
> other hand, what good does it do for the client to receive an old 
> representation of an entry that is currently being relentlessly edited? The 
> cost of implementing snapshotting seems to nearly always outweigh any 
> benefits it might have.
> 
> The only guarantees that I make beyond what AtomPub requires are: (1) if any 
> entries are skipped during paging, they will appear at the start of the feed, 
> (2) navigating to each "prev" page will always result in older entries than 
> the ones in the current page, and (3) navigating to each "next" page will 
> result in an empty page or in entries newer than the ones in the current 
> page. Implementations that do paging by numbering each page ("?page=2", 
> "?page=3") often cannot make those guarantees when entries are being 
> added/edited faster than the client can page through the feed.
> 
>> That said, however, once the initial sync is completed, 
>> subsequent incremental sync's are much more efficient if 
>> paging is not used (e.g. give me one feed document describing
>> just the things that have changed over the last twelve hours, etc).
> 
> On the other hand, if there were hundreds of entries updated in the last 12 
> hours (e.g. a server upgrade which modifies the representation of every 
> entry, or whatever else WordPress users do that causes every entry to appear 
> updated), the client and the server will probably both benefit from chunking 
> that up. Especially, on mobile phones, the larger the response, the less 
> likely the service provider's proxies will let it go through to the device.
> 
>> Again, this would work so long as the pages remain stable; 
>> however, I would very much like to see a comparison of how 
>> the two approaches compare in terms of performance / efficiency.
> 
> Well, obviously paging is going to require more roundtrips if there are 
> multiple pages used for the sync. If the server decides to use one page per 
> resync then both approaches are equivalent. In fact, disregarding 
> snapshotting, my solution is the same as yours, except in mine the server 
> controls the values of the "since" and "until" parameters, whereas in yours 
> the client controls them, and mine allows the server to break up the response 
> into multiple (cacheable) pages.
> 
> - Brian
> 
>

Re: Feed Paging for Efficient Feed Synchronization

Reply via email to