[uf-discuss] Bases

Scott Reynen Mon, 05 Dec 2005 05:47:55 -0800

Charles Iliya Krempeaux wrote:

On 12/5/05, Chris Messina <[EMAIL PROTECTED]> wrote:

On 12/4/05, Scott Reynen <[EMAIL PROTECTED]> wrote:

Personally, I suspect there's just not enough microformatted
content out there yet to make it worth Google's cycles parsing

it." [2]. But I thought it better to try and prove myself wrongwith

some code than to just speculate about it.


Um, why are we waiting for Google? I mean, besides technorati, aren't
microformats kind of the next frontier for "smart" search engines?

The "web as distributed database" sounds pretty damn appealing to me.


If you want to search all of it, and want to do it in a reasonable
amount of time, indexing helps.

Right, that's the first problem I ran into. If you want to crawl thewhole web, you have to index the whole web. And there's not enoughmicroformatted data out there to be worth indexing the whole web toget at it. Even restricting the crawler to one node away from afound microformat, only 293 out of 5163 (5%) URLs currently containmicroformats. Crawling the entire web, that percentage quicklyapproaches zero. Google Base, on the other hand, gets valuablestructured data out of 100% of submissions. Advantage Google.


David Janes -- BlogMatrix wrote:

we have developed a crawler that collects FOAF profiles from theWeb and
uploads them into Google Base.

I've been thinking about doing the same with the Microformat Basedata, but I don't really care to deal with the potential copyrightissues:

If you want to be removed from Google Base, please send us a mailand we will remove you.

If anyone else wants to be responsible for that, I'd be glad to makean Atom feed of hCards, which you could convert to a Google Baseupload format.


Peace,
Scott
_______________________________________________
microformats-discuss mailing list
[email protected]
http://microformats.org/mailman/listinfo/microformats-discuss

[uf-discuss] Bases

Reply via email to