Re: [uf-discuss] Storing Microformats

Ryan King Sun, 23 Sep 2007 18:40:30 -0700

On Sep 17, 2007, at 12:44 PM, Paul Kinlan wrote:

I have created a C#/.Net Stream-based Microformat parser
(http://www.codeplex.com/microformat) and I am trying to create some
reference applications to show it off.


I am in the process of creating an "Operator" like plugin for IE (It
currently parses and displays the microformats that have been found on
a page).

One of the other ideas that I am toying with is a Microformat spider,
that crawls the web looking for microformats, storing them and then
allowing them to be searched.   My question is: How are people storing
the data present in microformats so that they can be searched and
maintained and consumed in applications etc?

In short, I use mysql tables, one for each microformat and one foreach elemental type that can be many-to-many (images, photos, tags,etc) which then have polymorphic many-to-many relationships with thetables for the formats themselves.

We also build search indexes, currently using Ferret [http://ferret.davebalmain.com/trac/], but hopefully soon switching ourstandard Lucene infrastructure at Technorati.

We cache all objects in memcache with indefinite timeouts (all cacheclearing is done proactively). This includes all related items in onecache entry.

When it comes down to it, it's all a matter of scale. When we wereindexing 10^5 and 10^6 items, we would actually parse some of themarkup on the fly when someone did a search. Sounds crazy but itworked alright for awhile (I blame Tantek). Now we parse it all outinto a relatively normalized model. We're at 10^8 or so items now. Ifwe hit another order of magnitude we'll have to rethink things andprobably take some stuff (like BLOBs) out of the relational databaseand put them somewhere else.


-ryan
_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Re: [uf-discuss] Storing Microformats

Reply via email to