Well I wouldn't be surprised if Google does store as much data as possible in memory across thousands of machines. I doubt they're using a regular SQL database, might be some kind of structure ultra optimized for this particular purpose.

So you'd need a couple of hundred machines to load all this data, and then be able to search across them all.

I don't want to be someone that says if it could be done then Yahoo and Microsoft would have already done it. That's what the insurance company that provided the $10M for SpaceShipOne winning the X-PRIZE thought when they were betting that the odds are like a hole-in-one at a golf tournament. NASA also probably thought it was preposterous that a group of privately funded civilians could build a space ship in that time frame with only $20M-$30M.

BUT... when it comes to search it's all about relevancy and not speed; so I would say the money is in your deep search algorithm and worry about the implementation part later. After that patent it, then either get funding to go full bore or sell it.

From a strategy perspective, who says Google is the best anyways? Teoma was touting they're the best (http://www.teoma.com/), and even if they were, Google has such an awesome brand and mindshare it will take something mindblasting to knock them off their throne. Or check out www.snap.com, that's a neat approach on searching (the preview pane in the results is cool), and I doubt anyone on this list has even seen this site.

What Google is not so good at is intranet searches, as their algorithm is totally optimized for internet searching and there is no clear king in the enterprise search arena. There's some powerful players though (inXight, Convera, Verity, Coveo, Autonomy, etc...). That's where your chances are, and thousands of terabytes of memory not needed.


On 12/22/05, Jason Parkils <[EMAIL PROTECTED]> wrote:
I have already identified a potential competitve advantage over google.
Currently they store all their site data in custom databases. As everyone
knows, database access is notoriously slow. So if the metadata was moved
into ram (the "application" scope) onApplicationStart, you would be able to
perform deeper search functions in the same amount of time - getting you
better results. Nowadays, you could install several terabytes of ram on each
machine (64-bit computing) - so space shouldn't be an issue. The only thing
is that onApplicationStart will take several hours to load the data into ram
- but the servers will be clustered so that should only happen once.

So no-one knows of any good CFCs to do google-style searching? The
important thing is that they are open source so that I can improve upon
them. Are CFCs even the best way to go? I don't mind doing it through TAGs
or anything else. I was just told that CFCs were the best.

Also, is it worth it to get CF Enterprise edition for this or is Standard
ok?

Jason Parkils





----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to 
[email protected] with the words 'unsubscribe cfcdev' as the subject of the 
email.

CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting 
(www.cfxhosting.com).

An archive of the CFCDev list is available at 
www.mail-archive.com/[email protected]


Reply via email to