RE: [cms-list] managing metadata

Charles Reitzel Sun, 09 Feb 2003 17:55:02 -0800

Hi Nuno,

At 12:03 AM 2/8/2003 +0000, Nuno Lopes wrote:

Hi Charley, very interesting indeed your little MV query engine, how have you implemented it? In other words what is the underlying data store?

Me and Charley Bay are two different people. I actually go by Charlie most places. I decided to go by Charles on this list to avoid confusion with you, Charley! No biggie.

Anyway, I am hesitant to give too many details, but yes, it uses indexing as supplied by BerkeleyDB. Actual content, I keep in the file system.

1) Is it an RDBMS?
2) Is it a file or multiple files in the FS? If so what is the format of those files, XML?

In the case of 1) do you use the LIKE statement to apply pattern matching rules such as us.% (<=> us.*)? If not, how do you make it happen minimizing joins over multiple tables (that is the problem of Tony's approach IMO)?

I have implemented my own LIKE-like behavior that allows a minimal sort of pattern matching on tag (column) names. Things can get kind of interesting and fun when you cast off the shackles of SQL... Of course, you give up some things as well.

In case of the second, do you maintain indexes for fast string matching of megabytes (at least) of info? If so, how do you maintain them? If not
then do you use BTrees or something similar?

Right.

>coverage.location = 'us.ma.springfield'
>coverage.country = 'us'
>coverage.state = 'ma'
>coverage.city = 'springfield'

Yes, I also thought about implementing some syntactic sugar over SQL queries.

Syntax sugar is just that. A pretty UI is probably good for newbies/simple tasks. I'd recommend you allow direct access to the SQL for tuning and/or complex cases.

Instead of having:

(A)

SELECT *
FROM Company
WHERE MVP_GEO_COUNTRY='us' AND MV_GEO_COUNTRYREGION = 'ma' MVP_GEO_CITY=
'springfield"

We could allow queries such as:

(B)
SELECT *
FROM Company
WHERE GEO = "us.ma.springfield' (not sql at all)

OR

(C)
SELECT *
FROM Company
WHERE GEO = "us; ma; springfield' (not sql at all) (Dublin Core
Structured Values (DCSV) scheme)

OR

(D)
SELECT *
FROM Company
WHERE GEO = "COUNTRY:us; COUNTRY.COUNTRYREGION:ma;
COUNTRY.COUNTRYREGION.CITY:springfield;" (using Dublin Core Structured
Values (DCSV) scheme)

The best way to see the implication is analyzing the problem bottom up:

((D) <=> (C) <=> (B)) => (A)

In other words (D), (C), (B) would of course be converted to (A) by an
SQL preprocessor before sending the query to the RDBMS (more overhead).

"SELECTS" requiring multiple un-linked tables could be mapped into a
UNION (uff, really slow).

But for now, we are not planning to support this syntactic sugar mainly because users will define queries schemas using GUI and with a UI steamed from faceted classification that basically does more or less the same. Our goal is to configure the system without the need for coding including query schemas (search interfaces).

Best regards,

Nuno Lopes

PS: Multi values in DC are called Structured Values

I would be careful not to mix up hierarchical (aka structured) values with multiple values, which may be either simple or structured. E.g.

coverage.location = 'us.ma.springfield'
coverage.location = 'us.fl.miami'
coverage.location = 'us.mn.saintpaul'

Your heuristic approach is, in essence, supporting single, structured values but also exposing the component values. If I understand you, this design is based on your observation that the most common use for multiple values are the components of a structured value. I don't doubt there is some merit to your approach. But I wouldn't promote it as supporting MV. But you might describe it as having flexible support for structured values.

It will not work in all cases. The most common use for MV tags is simple cross-references. You want a chunk of content to be bundled in multiple "views". Just add as many view names as you want the document to appear within. The software doesn't care; it is data driven.

take it easy,
Charlie

--
http://cms-list.org/
more signal, less noise.

RE: [cms-list] managing metadata

Reply via email to