A quick note on the efficiency issue. I just ran a quick test. I created 20 feed pages, each with 20 entries each. Using the feed paging model, I grab the first feed, scan the entries to see if they have been deleted or updated. Deleted entries are represented by an x:deleted element *within the entry*. Then I grab the next page[1], scan the entries, grab the next, scan the entries, etc until I come to an entry whose app:edited timestamp is earlier than the cutoff I defined (which assumes the entries are sorted according to app:edited, which is not guaranteed)[2]. I ran the process 100 times, throwing away the first run because of initialization overhead. The times (ms) for each run are:
[1681, 6114, 2049, 1923, 1657, 1596, 1634, 1676, 1630, 1635, 1790, 1610, 1900, 1639, 1604, 1809, 1763, 1775, 2040, 8441, 2126, 2229, 1713, 1687, 1812, 2012, 1722, 1770, 1734, 1683, 1625, 1613, 1622, 1617, 2130, 1687, 1675, 1867, 7261, 2006, 1661, 1610, 1662, 1584, 1686, 1753, 1632, 1629, 1808, 1691, 1631, 1756, 7585, 1660, 1638, 2671, 1702, 1872, 1593, 1895, 1606, 1692, 1647, 1987, 1750, 1880, 1708, 1793, 2160, 1654, 1854, 1931, 1613, 3006, 2041, 2319, 2493, 1654, 1713, 1619, 1588, 1765, 2046, 1579, 2308, 1940, 1820, 1996, 1655, 1603, 1863, 1723, 1628, 1575, 1645, 1568, 1860, 1700, 1864] [1]: http pipelining is enabled. [2]: If entries are not sorted by app:edited, things get much worse. By comparison, I created a single snapshot feed containing exactly the same set of modified entries. Deleted entries are represented using an x:deleted tombstone element rather than atom:entry elements. The feed ONLY contains x:deleted and atom:entry elements representative of items updated after the cutoff I selected. I ran the process 100 times, throwing away the first run because of initialization overhead. The times (ms) for each run are: [225, 253, 415, 208, 155, 95, 104, 66, 106, 74, 102, 76, 70, 113, 64, 68, 73, 90, 64, 101, 427, 181, 177, 58, 58, 947, 50, 46, 48, 70, 83, 89, 41, 143, 74, 101, 39, 59, 38, 41, 37, 45, 269, 40, 403, 37, 38, 37, 62, 38, 36, 64, 36, 37, 57, 36, 37, 70, 36, 38, 84, 41, 34, 50, 33, 33, 53, 130, 107, 132, 55, 37, 53, 38, 39, 88, 39, 39, 168, 95, 73, 102, 62, 48, 84, 47, 45, 133, 43, 81, 367, 55, 50, 89, 45, 67, 47, 67, 57] The test feeds and code are located at http://www.snellspace.com/public/feeds - James Brian Smith wrote: > James M Snell wrote: >> Brian Smith wrote: >>> James Snell wrote at http://www.snellspace.com/wp/?p=818: >>>> And Because the entire snapshot is contained within a single feed >>>> document, we do not have to worry about the race condition that is >>>> inherent in the rolling window feed paging model. >>> The race condition is not an inherent problem in the paging >>> model; it is an implementation issue. In my implementation, I >>> do all the paging in my collection feed using links like: >> Note that I specifically indicated the rolling window feed >> paging model. >> Yes, there are ways of doing the paging where the pages are >> stable, and if we can rely on folks doing it that way >> consistently then paging is fine. The key requirement is >> that the snapshot has to remain stable throughout the sync process. > > To be clear, I don't provide any kind of snapshot mechanism. The client needs > to go back to the start of the feeds (most recently edited entries) after it > has paged through all the older entries, to see if some older entry has been > modified during the paging process. If some entry or entries are constantly > being edited, then it is possible that the client will never see them. On the > other hand, what good does it do for the client to receive an old > representation of an entry that is currently being relentlessly edited? The > cost of implementing snapshotting seems to nearly always outweigh any > benefits it might have. > > The only guarantees that I make beyond what AtomPub requires are: (1) if any > entries are skipped during paging, they will appear at the start of the feed, > (2) navigating to each "prev" page will always result in older entries than > the ones in the current page, and (3) navigating to each "next" page will > result in an empty page or in entries newer than the ones in the current > page. Implementations that do paging by numbering each page ("?page=2", > "?page=3") often cannot make those guarantees when entries are being > added/edited faster than the client can page through the feed. > >> That said, however, once the initial sync is completed, >> subsequent incremental sync's are much more efficient if >> paging is not used (e.g. give me one feed document describing >> just the things that have changed over the last twelve hours, etc). > > On the other hand, if there were hundreds of entries updated in the last 12 > hours (e.g. a server upgrade which modifies the representation of every > entry, or whatever else WordPress users do that causes every entry to appear > updated), the client and the server will probably both benefit from chunking > that up. Especially, on mobile phones, the larger the response, the less > likely the service provider's proxies will let it go through to the device. > >> Again, this would work so long as the pages remain stable; >> however, I would very much like to see a comparison of how >> the two approaches compare in terms of performance / efficiency. > > Well, obviously paging is going to require more roundtrips if there are > multiple pages used for the sync. If the server decides to use one page per > resync then both approaches are equivalent. In fact, disregarding > snapshotting, my solution is the same as yours, except in mine the server > controls the values of the "since" and "until" parameters, whereas in yours > the client controls them, and mine allows the server to break up the response > into multiple (cacheable) pages. > > - Brian > >
