On May 1, 2008, at 10:06 AM, Matt Mahoney wrote:
--- "J. Andrew Rogers" <[EMAIL PROTECTED]> wrote:

Your model above tacitly predicates its optimality on a naive MCP
strategy, but is not particularly well-suited for it.  In short, this
means that you are assuming that the aggregate latency function for a
transaction over the network is a close proxy for the transaction
cost.  At one time this might have been a reasonable assumption, but
it becomes less true every year.

That's true in my thesis but I dropped it in my CMR proposal.  Now I
assume that peers operate in a hostile environment. A message could be
anything.  The protocol has to work even over unreliable UDP with
forged source IP addresses.  The problem is sort of like building a
brain out of neurons that are trying to kill each other.


(Yes, late. I do not have much free time.)

A brain where all the neurons are out to kill each other is a proper metaphor for the design problem. In real protocols, every time someone posited benevolence for some aspect it was promptly exploited.


In my thesis, I asked whether it was possible even in theory to build a
large scale distributed index.  None existed in 1997 and none exists
today. The best known examples of internet wide databases were USENET, which uses O(n^2) storage, and DNS, which is O(n) (assuming it grows in
depth with constant branching factor, although it doesn't really) but
is vulnerable at the root servers. Centralized search engines are also
O(n^2) because you need O(n) servers for n clients.  This creates an
incentive for engines to merge to save resources, resulting in a
monopoly.  (Who has the resources to compete with Google?)


There is an increasingly strong political incentive (between countries) to create distributed indexes, but quite frankly the technology does not exist. This was something I studied in earnest when various governments started demanding such guarantees. To the best of my knowledge, we do not have mathematics that can support the guarantees desired, though decentralized indexes are certainly practical if one ignores certain considerations that are politically important.

Something to understand about the big server clusters: as commonly implemented, the online server cluster is independent of the content generation cluster. Queries may be very cheap to serve even if the aggregation and analytics process is expensive. Compute a result once and serve it to the world a thousand times. The real problems occur when the data set is not sufficiently static that this trick is plausible. Fortunately, no one has noticed the man behind the curtain (yet).

Losing to Google is predicated on following their path, and they occupy a space where the computer science is transparently inadequate. It does not take much of a qualitative shift in the market to kill a company in that position. There is plenty of vulnerability left in the market.

I would argue, from a business perspective, is that most of the value with respect to distribution is in the metadata protocol, virtually all of which are based on naive designs that ignore literature in practice. A really strong metadata protocol that could be standardized would generate a hell of a lot of value. Past that, whoever controls the essential data under that protocol would win, and for better or worse, Google is largely not responding to this. There are many types of data they have no capacity to handle in bulk. This is not so much a criticism of Google but an observation about their actual behavior.

Cheers,

J. Andrew Rogers



-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com

Reply via email to