Re: [Demexp-dev] A protocol proposal to track modifications on server

2005-06-16 Par sujet David MENTRE
Hi Flix,

Many thanks for the feedback. Overall, I find your approach quite
interesting. From a programming point of view, it is quite simple to
implement, except that it will add once more new dependencies to the
code (zlib) and I need to find suitable binding. And I need to figure
how to handle endianness in OCaml.

I'll go for your approach.


Some comments on your comments:

[EMAIL PROTECTED] writes:

 1/ Concerning your proposed approach
 -I think that it is probably most efficient in terms of bandwidth.
 I would recommend using a binary tree (instead of 10-tree you seemed
 to suggest in your mail) because I would intuitively
 guess it is optimal in terms of bandwidth needed to check the
 timestamps (but not optimal in terms of server workload, see below).
 -I may put some heavy worldload on the server because each time an
 object is modified (mainly questions and tags), all timestamps
 hierarchically above the said object are to be modified. In this
 case a non-binary tree is better because the larger the order of the
 tree, the less parents to update.

If I would keep my algorithm, I would implement it with the tree width
defined as a constant that could be changed. And I would probably keep
the 10-tree approach.

 -I may be complex to implement (as compared to brute method, see below)
 because of the recursive nature of the algorithm and because the server
 and client trees do not have the same shape (there are always more
 objects on the server due to new questions etc.)

OCaml helps a lot when one need to encode algorithms. But that's true
that this is more complex than your approach.

 2/ Possible other approach: brute force

 That's 210 timestamps. Now, obviously, the time resolution of
 the timestamps should not be higher than the time resolution of the
 file update on the server. With 24 bits for a timestamp,
 your system can last for 2^24=16777216 and 1216/(30*24*365)=63,8 years.
 The uncompressed file would be:
 210 timestamps * 24 bits /(8*1024)=6 MB

I prefer a 32 bits timestamp (easier to code). With 31 bits (signed
integer), starting from 2005-01-01, it will allow timestamp until about
2068 if we keep the 1s resolution and until about 1 if we keep the
2mn resolution. I'll probably use 2s resolution (until about 2130).

At 32 bits, it gives about 8 MB uncompressed.

 You would need a file format to be able to assign each timestamp to its
 corresponding object but that should be easy.

Proposed format:

32 bits int: number of questions (Q)
32 bits int: number of tags (T)
32 bits int: number of participants (P)
32 bits int: question timestamp * N
32 bits int: tags timestamp * T
32 bits int: participant timestamp * P

All integers encoded in big endian format.

The tree first integers could be encoded in the XDR format separately.

Do you think it is an issue if the three timestamp zones (for questions,
tags and participants) are gzipped separately?

 Let me know what you thing.

I like it. :-) So I'm going to implement it.


Yours,
d.
-- 
pub  1024D/A3AD7A2A 2004-10-03 David MENTRE [EMAIL PROTECTED]
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A



___
Demexp-dev mailing list
Demexp-dev@nongnu.org
http://lists.nongnu.org/mailman/listinfo/demexp-dev


[Demexp-dev] A protocol proposal to track modifications on server

2005-06-15 Par sujet David MENTRE
Hello,

I'm currently trying to design the protocol, data structures and
algorithms needed to imlement caching in the client.

The main idea is quite simple: 

  To each object in the server (participant, question or tag) associate
  a 64 bits timestamp (following the scheme suggested by FĂ©lix). Each
  time an object is modified, its timestamp is updated with latest
  modification real time.

  When the client needs to know whether its cache is synchronized or not
  with the server, it only needs to check the object timestamp on the
  server. If the client's local value is the same, it has latest
  version, otherwise it needs to fetch the latest one from server.


My only concern is regarding timestamp fetching. How to optimize it in
such a way that the client won't request timestamp of *all* objects on
the server?

I'm thinking at the following solution:

 - the server keeps the highest timestamp for groups of 10, 100, 1000,
   ... objects in a tree, the bigger the group, higher it is in the
   tree. 

   For example, highest timestamps for questions from 0 to 9, 10 to 19,
   ..., highest timestamp for questions from 0 to 99, etc.;

 - the client keeps the same data structure for its cache;

 - when the client needs to check its local cache validity, it requests
   the highest timestamp for questions from 0 to the maximum number of
   questions. For example, highest timestamp for questions from 0 to
   99. 

   If the timestamp given by server does not match the highest timestamp
   localy stored, the client requests the 10 sub-timestamps. For
   example, timestamps for questions from 0 to 9, 10 to 19, ..., 90 to
   99. For each one of them, the client check if it has the same value.

   If not, get individual question info for unmatching timestamp.


On the server or client side, when an object is modified, e.g. question
18, it is only needed to update a limited amount of timestamps,
e.g. from 10 to 19, from 0 to 99, from 0 to 999, ...

Obviously, the algorithm has an logarithmic complexity and is scalable
to a large amount of objects.

What do you think of it?

Any suggestion of suitable data structures? Any pitfalls to avoid?

Yours,
d.
-- 
pub  1024D/A3AD7A2A 2004-10-03 David MENTRE [EMAIL PROTECTED]
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A



___
Demexp-dev mailing list
Demexp-dev@nongnu.org
http://lists.nongnu.org/mailman/listinfo/demexp-dev