Charles Iliya Krempeaux wrote:

On 12/5/05, Chris Messina <[EMAIL PROTECTED]> wrote:
On 12/4/05, Scott Reynen <[EMAIL PROTECTED]> wrote:

Personally, I suspect there's just not enough microformatted
content out there yet to make it worth Google's cycles parsing
it." [2]. But I thought it better to try and prove myself wrong with
some code than to just speculate about it.

Um, why are we waiting for Google? I mean, besides technorati, aren't
microformats kind of the next frontier for "smart" search engines?

The "web as distributed database" sounds pretty damn appealing to me.

If you want to search all of it, and want to do it in a reasonable
amount of time, indexing helps.

Right, that's the first problem I ran into. If you want to crawl the whole web, you have to index the whole web. And there's not enough microformatted data out there to be worth indexing the whole web to get at it. Even restricting the crawler to one node away from a found microformat, only 293 out of 5163 (5%) URLs currently contain microformats. Crawling the entire web, that percentage quickly approaches zero. Google Base, on the other hand, gets valuable structured data out of 100% of submissions. Advantage Google.

David Janes -- BlogMatrix wrote:

we have developed a crawler that collects FOAF profiles from the Web and
uploads them into Google Base.

I've been thinking about doing the same with the Microformat Base data, but I don't really care to deal with the potential copyright issues:

If you want to be removed from Google Base, please send us a mail and we will remove you.

If anyone else wants to be responsible for that, I'd be glad to make an Atom feed of hCards, which you could convert to a Google Base upload format.

Peace,
Scott
_______________________________________________
microformats-discuss mailing list
[email protected]
http://microformats.org/mailman/listinfo/microformats-discuss

Reply via email to