1.
OK we misunderstood the input format - it is neighbour lists, not pairs of contacts, Emre's example test data misled us.

2.
My challenge to my students still stands (modified in the light of our new information).
a) Modify the Graham code to work on pairs of contacts
b) Find the -algorithmic- inefficiency in Graham's code and correct it. My example (misunderstood) test data gives a clue to the algorithmic inefficiency. c) Solve the implementation inefficiencies in Graham's code to get down to a solution that is O(e+v) or, at the very worst, O( (e+v) log (e+v))

I once again attach the scheme version of the Graham code.

3.
Technicalities:-
There are two inefficiencies in Graham's code

(a) the algorithmic inefficiency that is the subject of challenge 2(b).

(b) the use of inherently time inefficient LISP structures - this is important, but -much- less important than (a). LISP/Scheme, of course, offers alternative, more efficient, structures, or you can easily and cheaply build your own.

4.
Cember's problem:
a) I repeat that this is a problem whose solution is LINEAR in the total size of its input. It is therefore easily soluble, in Cember's practical situation, without blowing up the server, even for tens of thousands of subscribers. You can make that hundreds of thousands of subscribers, given the likely e/v ratio (quite low) and the likely pattern of contacts. O(e+v) is for finding ALL shortest paths starting at a give node. We only want one. b) IF the Cember SQL solution has the same -algorithmic- inefficiency as Graham's solution that will explain the explosions. c) Those on the mail lists who are proposing cacheing are missing the point. If we keep the neighbour lists in primary memory, this problem is trivial. If there is a performance problem, do -not- cache the shortest paths (it is a list of size O(n^3)!) Cache the neighbour lists!! We know when we update the neighbour lists in the database, so we can keep them in a simple write through cache. d) Language choice is not important. Whatever language the existing system is written in, use that. PHP gives extensible arrays, efficiently implemented as hash tables, and easy interface to the database. Given that you then only have to write seven lines of code, even I would be prepared to (in fact prefer to!) write it in PHP: then wash my hands, of course. Writing this particular algorithm efficiently in LISP/Scheme requires doing some unlispy things or writing some complicated code. Why bother with the interfacing problems for no significant gain? Actually you are exaggerating the interfacing problems.

Emre, put Cember in touch with me. I will solve their problem. It is trivial.

After 0900 on Wednesday I will publish my solutions.

Sorry, Can Burak, shortest path is an -easy- problem!

CS




Can Burak Cilingir wrote:

[ ... ]

What you say is: Once your server is crunched and cached
the results of those queries, ok, it won't crash if the
same queries are made. But of course, each time brand new
queries with different Contacts are generated.

Could I make myself clear this time?


I was already clear on that but trying to implement caching somehow. My question for the previous mail was not "is caching can handle the load or not". It was the point of my first mail (do that once a day) and understood that is not possible.

so my question is is there a better way to cache this? (see my paragragh below "Maybe we need a trade-off here")


 >function shortest-path (membera memberb)
 >{
 >if ispathcached (membera memberb)

Probably not.

You are absolutely right. When I browse such sites, I rarely click on people whom I already know.


 >     p = getcachedpath (membera memberb)
 >     t = getcachedtime (membera memberb)

 >//is cache still valid?
 >for each member of p as m
 >     mt = getmodificationtimeofconnlist(m)
 >     if (mt > t)
 >     {
 >         np = regeneratecache(membera memberb)
 >         return np
 >     }

And you imagine connections are rock hard?
Maybe our good old acquaintance has just left the
network. I'm making the same query, you and me
but the network data has changed. So you have
to modify your cache. That means recalculating.
What an acquaintance! Anyway, again you had to
calculate. Lots of calculations, people are
clicking, think of 10.000 people network, a few
thousand online, every minute a few 10 people
are coming, partially connected and making queries
which are not cached yet.

Maybe we need a trade-off here.


we can cache some intermediate paths and somehow calculate "the not so shortest path". a -> c is (a g j o d c) for a->d, you can assume (a g j o d) although there may be a path such that (a w d) if you need a connection, we have that. but if you need "the" shortest path this is not the answer. I think this is better than showing nothing.

We are also talking about "shortest path" which is not an easy to solve problem. and doing this over and over again. I am curious how orkut handles that load (with the help of google's cluster? :)).


Emre S.



(define assoc-null
  (lambda (q alist)
    (let ((answer (assoc q alist)))
    (cond
      ((not answer) null)
      (else answer)))))

(define cdr-null
  (lambda (x)
    (cond
      ((null? x) null)
      (else (cdr x)))))

(define shortest-path 
     (lambda (start end net)
       (bfs end (list (list start)) net)))

(define bfs 
     (lambda (end queue net)
       (if (null? queue)
           null
           (let ((path (car queue)))
             (let ((node (car path)))
               (if (eq? node end)
                   (reverse path)
                   (bfs end
                        (append (cdr queue)
                                (new-paths path node net))
                        net)))))))
     
(define new-paths 
  (lambda (path node net)
    (map (lambda (n)
                (cons n path))
            (cdr-null (assoc-null node net)))))

_______________________________________________
cs-lisp mailing list
cs-lisp@cs.bilgi.edu.tr
http://church.cs.bilgi.edu.tr/lcg
http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp

Cevap