On May 1, 2008, at 10:06 AM, Matt Mahoney wrote:
--- "J. Andrew Rogers" <[EMAIL PROTECTED]> wrote:
Your model above tacitly predicates its optimality on a naive MCP
strategy, but is not particularly well-suited for it. In short, this
means that you are assuming that the aggregate latency function for a
transaction over the network is a close proxy for the transaction
cost. At one time this might have been a reasonable assumption, but
it becomes less true every year.
That's true in my thesis but I dropped it in my CMR proposal. Now I
assume that peers operate in a hostile environment. A message could
be
anything. The protocol has to work even over unreliable UDP with
forged source IP addresses. The problem is sort of like building a
brain out of neurons that are trying to kill each other.
(Yes, late. I do not have much free time.)
A brain where all the neurons are out to kill each other is a proper
metaphor for the design problem. In real protocols, every time someone
posited benevolence for some aspect it was promptly exploited.
In my thesis, I asked whether it was possible even in theory to
build a
large scale distributed index. None existed in 1997 and none exists
today. The best known examples of internet wide databases were
USENET,
which uses O(n^2) storage, and DNS, which is O(n) (assuming it grows
in
depth with constant branching factor, although it doesn't really) but
is vulnerable at the root servers. Centralized search engines are
also
O(n^2) because you need O(n) servers for n clients. This creates an
incentive for engines to merge to save resources, resulting in a
monopoly. (Who has the resources to compete with Google?)
There is an increasingly strong political incentive (between
countries) to create distributed indexes, but quite frankly the
technology does not exist. This was something I studied in earnest
when various governments started demanding such guarantees. To the
best of my knowledge, we do not have mathematics that can support the
guarantees desired, though decentralized indexes are certainly
practical if one ignores certain considerations that are politically
important.
Something to understand about the big server clusters: as commonly
implemented, the online server cluster is independent of the content
generation cluster. Queries may be very cheap to serve even if the
aggregation and analytics process is expensive. Compute a result once
and serve it to the world a thousand times. The real problems occur
when the data set is not sufficiently static that this trick is
plausible. Fortunately, no one has noticed the man behind the curtain
(yet).
Losing to Google is predicated on following their path, and they
occupy a space where the computer science is transparently
inadequate. It does not take much of a qualitative shift in the
market to kill a company in that position. There is plenty of
vulnerability left in the market.
I would argue, from a business perspective, is that most of the value
with respect to distribution is in the metadata protocol, virtually
all of which are based on naive designs that ignore literature in
practice. A really strong metadata protocol that could be standardized
would generate a hell of a lot of value. Past that, whoever controls
the essential data under that protocol would win, and for better or
worse, Google is largely not responding to this. There are many types
of data they have no capacity to handle in bulk. This is not so much a
criticism of Google but an observation about their actual behavior.
Cheers,
J. Andrew Rogers
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com