1.
OK we misunderstood the input format - it is neighbour lists, not pairs
of contacts, Emre's example test data misled us.
2.
My challenge to my students still stands (modified in the light of our
new information).
a) Modify the Graham code to work on pairs of contacts
b) Find the -algorithmic- inefficiency in Graham's code and correct it.
My example (misunderstood) test data gives a clue to the algorithmic
inefficiency.
c) Solve the implementation inefficiencies in Graham's code to get down
to a solution that is O(e+v) or, at the very worst, O( (e+v) log (e+v))
I once again attach the scheme version of the Graham code.
3.
Technicalities:-
There are two inefficiencies in Graham's code
(a) the algorithmic inefficiency that is the subject of challenge 2(b).
(b) the use of inherently time inefficient LISP structures - this is
important, but -much- less important than (a). LISP/Scheme, of course,
offers alternative, more efficient, structures, or you can easily and
cheaply build your own.
4.
Cember's problem:
a) I repeat that this is a problem whose solution is LINEAR in the total
size of its input. It is therefore easily soluble, in Cember's practical
situation, without blowing up the server, even for tens of thousands of
subscribers. You can make that hundreds of thousands of subscribers,
given the likely e/v ratio (quite low) and the likely pattern of
contacts. O(e+v) is for finding ALL shortest paths starting at a give
node. We only want one.
b) IF the Cember SQL solution has the same -algorithmic- inefficiency as
Graham's solution that will explain the explosions.
c) Those on the mail lists who are proposing cacheing are missing the
point. If we keep the neighbour lists in primary memory, this problem is
trivial. If there is a performance problem, do -not- cache the shortest
paths (it is a list of size O(n^3)!) Cache the neighbour lists!! We know
when we update the neighbour lists in the database, so we can keep them
in a simple write through cache.
d) Language choice is not important. Whatever language the existing
system is written in, use that. PHP gives extensible arrays, efficiently
implemented as hash tables, and easy interface to the database. Given
that you then only have to write seven lines of code, even I would be
prepared to (in fact prefer to!) write it in PHP: then wash my hands, of
course. Writing this particular algorithm efficiently in LISP/Scheme
requires doing some unlispy things or writing some complicated code. Why
bother with the interfacing problems for no significant gain? Actually
you are exaggerating the interfacing problems.
Emre, put Cember in touch with me. I will solve their problem. It is
trivial.
After 0900 on Wednesday I will publish my solutions.
Sorry, Can Burak, shortest path is an -easy- problem!
CS
Can Burak Cilingir wrote:
[ ... ]
What you say is: Once your server is crunched and cached
the results of those queries, ok, it won't crash if the
same queries are made. But of course, each time brand new
queries with different Contacts are generated.
Could I make myself clear this time?
I was already clear on that but trying to implement caching somehow.
My question for the previous mail was not "is caching can handle the
load or not". It was the point of my first mail (do that once a day)
and understood that is not possible.
so my question is is there a better way to cache this? (see my
paragragh below "Maybe we need a trade-off here")
>function shortest-path (membera memberb)
>{
>if ispathcached (membera memberb)
Probably not.
You are absolutely right. When I browse such sites, I rarely click on
people whom I already know.
> p = getcachedpath (membera memberb)
> t = getcachedtime (membera memberb)
>//is cache still valid?
>for each member of p as m
> mt = getmodificationtimeofconnlist(m)
> if (mt > t)
> {
> np = regeneratecache(membera memberb)
> return np
> }
And you imagine connections are rock hard?
Maybe our good old acquaintance has just left the
network. I'm making the same query, you and me
but the network data has changed. So you have
to modify your cache. That means recalculating.
What an acquaintance! Anyway, again you had to
calculate. Lots of calculations, people are
clicking, think of 10.000 people network, a few
thousand online, every minute a few 10 people
are coming, partially connected and making queries
which are not cached yet.
Maybe we need a trade-off here.
we can cache some intermediate paths and somehow calculate "the not so
shortest path". a -> c is (a g j o d c) for a->d, you can assume (a g
j o d) although there may be a path such that (a w d) if you need a
connection, we have that. but if you need "the" shortest path this is
not the answer. I think this is better than showing nothing.
We are also talking about "shortest path" which is not an easy to
solve problem. and doing this over and over again. I am curious how
orkut handles that load (with the help of google's cluster? :)).
Emre S.
(define assoc-null
(lambda (q alist)
(let ((answer (assoc q alist)))
(cond
((not answer) null)
(else answer)))))
(define cdr-null
(lambda (x)
(cond
((null? x) null)
(else (cdr x)))))
(define shortest-path
(lambda (start end net)
(bfs end (list (list start)) net)))
(define bfs
(lambda (end queue net)
(if (null? queue)
null
(let ((path (car queue)))
(let ((node (car path)))
(if (eq? node end)
(reverse path)
(bfs end
(append (cdr queue)
(new-paths path node net))
net)))))))
(define new-paths
(lambda (path node net)
(map (lambda (n)
(cons n path))
(cdr-null (assoc-null node net)))))
_______________________________________________
cs-lisp mailing list
cs-lisp@cs.bilgi.edu.tr
http://church.cs.bilgi.edu.tr/lcg
http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp