Nicola Ken Barozzi wrote:
Ross Gardler wrote:

Nicola Ken Barozzi wrote:


...

Meta-data is often processed independently of actual data. For example, a meta-data harvester is not interested in content. Of course, this would still be possible if it were all in the same file, but performance would suffer.


So it's a technical and not a design issue. I tend to optimize later (and sometimes get bitten ;-)

That is also my method, and I am being bitten on the Learning Objects projects - hence my concern here. The solution I have implemented is a simple on in which the meta-data is embedded in the XHTML and it's a real problem.



In some use cases this performance bottleneck will become an issue, for example I have Learning Objects that consist of over 600 pages, each with an average of around 4000 characters. Each page has an additional 1000 characters of meta data (not including XML elements). The nature of XML is that you need to process the whole file even if you only want one element. This means we are processing 5000 characters of data instead of 1000, when harvesting 600 pages that is 300,000 characters of data that we don't want (and we are still ignoring the XML elements).


Well, it's not true, as the stremaing parser can stop processing at any time. This is how we get the doctype.

OK, I concede that one :-))

Furthermore, meta-data tends to be highly structured and would therefore benefit from being stored in a relational database rather than an XML one.


Hmmm, this is an interesting point.

I have an RDBMS Database plugin sitting on my hard drive that was built exactly for this purpose. I've not had the time to package it nicely yet, but it will be committed when time allows (or someone asks for it).


My use case for metadata is title, author, etc. You have a much more complex use case. I now start to understand more of your POV.

I'm sure this will be the most common use case. Perhaps we now have the first situation where we need two alternative plugins doing the same basic job; A simple meta data plugin for handling the majority use case of author, subject etc. and a complex one with an RDBMS and full dublin core meta-data.


All that being said, Forrest could be made to support both a separate file or embedded data (there are use cases where the simplistic solution is the best one). The problem with this is that we will have two locations for storing the same data - could be confusing for users.


IMHO the least we have to worry about is to confuse users. I have seen that if there is a simple and a more complete way, users would not get confused.

OK

The only confusion would come out of using both methods at the same time, with clashing metadata values. Ouch!

I'll browse the web for 'rdf in html', 'xhtml metadata' etc to see how this is defined elsewhere. I want to try to reinvent the wheel the least possible.

I see from another mail you have found some solutions, so I'll return to this elsewhere.



Also, what is the relationship between skinconf.xml, pdf-output.conf.xml and metadata.xml?


The files in FORREST-INF are the defaults. So skinconf.xml contains the site wide defaults for presentation elements that are core to Forrest. pdf-output.conf.xml contains the defaults for PDF config (button on or off, page size etc.) Metadata.xml contains site wide meta data (generated-by, site title etc.)


Why are skinconf.xml and pdf-output.conf.xml separate?

My thinking was only that the items in pdf-output.xml are irrelevant if the pdf-output plugin is not present. Consequently, we do not want to clutter skinconf.xml in fresh-site with config information not relevant to the core.


An alternative would be to have the plugin add the elements into the project skinconf-xml file the first time the site is built with the plugin enabled.


It seems we are in violent agreement on the concept, just need to iron out some smaller details.

Yes, and to see what Thorsten comes up with on the forrest:views front.

Ross

Reply via email to