Well I wouldn't be surprised if Google does store as much data as
possible in memory across thousands of machines. I doubt they're using a
regular SQL database, might be some kind of structure ultra optimized
for this particular purpose.
So you'd need a couple of hundred machines to load all this data, and
then be able to search across them all.
I don't want to be someone that says if it could be done then Yahoo and
Microsoft would have already done it. That's what the insurance company
that provided the $10M for SpaceShipOne winning the X-PRIZE thought when
they were betting that the odds are like a hole-in-one at a golf
tournament. NASA also probably thought it was preposterous that a group
of privately funded civilians could build a space ship in that time
frame with only $20M-$30M.
BUT... when it comes to search it's all about relevancy and not speed;
so I would say the money is in your deep search algorithm and worry
about the implementation part later. After that patent it, then either
get funding to go full bore or sell it.
From a strategy perspective, who says Google is the best anyways? Teoma
was touting they're the best (http://www.teoma.com/), and even if they
were, Google has such an awesome brand and mindshare it will take
something mindblasting to knock them off their throne. Or check out
www.snap.com, that's a neat approach on searching (the preview pane in
the results is cool), and I doubt anyone on this list has even seen this
site.
What Google is not so good at is intranet searches, as their algorithm
is totally optimized for internet searching and there is no clear king
in the enterprise search arena. There's some powerful players though
(inXight, Convera, Verity, Coveo, Autonomy, etc...). That's where your
chances are, and thousands of terabytes of memory not needed.
On 12/22/05, Jason Parkils <[EMAIL PROTECTED]> wrote:
I have already identified a potential competitve advantage over google.
Currently they store all their site data in custom databases. As everyone
knows, database access is notoriously slow. So if the metadata was moved
into ram (the "application" scope) onApplicationStart, you would be able to
perform deeper search functions in the same amount of time - getting you
better results. Nowadays, you could install several terabytes of ram on each
machine (64-bit computing) - so space shouldn't be an issue. The only thing
is that onApplicationStart will take several hours to load the data into ram
- but the servers will be clustered so that should only happen once.
So no-one knows of any good CFCs to do google-style searching? The
important thing is that they are open source so that I can improve upon
them. Are CFCs even the best way to go? I don't mind doing it through TAGs
or anything else. I was just told that CFCs were the best.
Also, is it worth it to get CF Enterprise edition for this or is Standard
ok?
Jason Parkils
----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to
[email protected] with the words 'unsubscribe cfcdev' as the subject of the
email.
CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting
(www.cfxhosting.com).
An archive of the CFCDev list is available at
www.mail-archive.com/[email protected]