Charles Iliya Krempeaux wrote:
On 12/5/05, Chris Messina <[EMAIL PROTECTED]> wrote:
On 12/4/05, Scott Reynen <[EMAIL PROTECTED]> wrote:
Personally, I suspect there's just not enough microformatted
content out there yet to make it worth Google's cycles parsing
it." [2]. But I thought it better to try and prove myself wrong
with
some code than to just speculate about it.
Um, why are we waiting for Google? I mean, besides technorati, aren't
microformats kind of the next frontier for "smart" search engines?
The "web as distributed database" sounds pretty damn appealing to me.
If you want to search all of it, and want to do it in a reasonable
amount of time, indexing helps.
Right, that's the first problem I ran into. If you want to crawl the
whole web, you have to index the whole web. And there's not enough
microformatted data out there to be worth indexing the whole web to
get at it. Even restricting the crawler to one node away from a
found microformat, only 293 out of 5163 (5%) URLs currently contain
microformats. Crawling the entire web, that percentage quickly
approaches zero. Google Base, on the other hand, gets valuable
structured data out of 100% of submissions. Advantage Google.
David Janes -- BlogMatrix wrote:
we have developed a crawler that collects FOAF profiles from the
Web and
uploads them into Google Base.
I've been thinking about doing the same with the Microformat Base
data, but I don't really care to deal with the potential copyright
issues:
If you want to be removed from Google Base, please send us a mail
and we will remove you.
If anyone else wants to be responsible for that, I'd be glad to make
an Atom feed of hCards, which you could convert to a Google Base
upload format.
Peace,
Scott
_______________________________________________
microformats-discuss mailing list
[email protected]
http://microformats.org/mailman/listinfo/microformats-discuss