Hi Karl, Personally, I would choose the shortest way to make things work. ;-) And MarkLogic Server doesn't require you to choose between the three. You can intermingle if you like as well.
If your current data is following a certain standard, then it is likely that it is so for a certain reason. Perhaps it is necessary to be able to exchange data with other parties or applications. This is a very strong reason to preserve the content in its original format, whether MarkLogic Server can handle that well or not. But thanks to namespaces and document properties in MarkLogic Server, it is quite easy to add information that is optimized for searching or user presentation, to make less optimally structured content work better in MarkLogic Server. You can always store calculated data in document properties, add namespaced attributes to specific nodes to optimize certain things and filter them out when exchanging data with other systems, add meta information in a separate xml structure that is inserted in the existing data structure, or wrap the contents in a new root element which allows additional information at root level. Document properties prevent mingling data, the last solution is one in which separating the data is very easy. But apart from that, it might be just as likely that MarkLogic Server could perform really well with the existing structure, if indices and search expressions would be chosen carefully. Unfortunately, you leave us in the dark why you think solution #2 should dominate entirely over the others. Perhaps you could elaborate on that first? And while at it, give us some hints on the big picture. What are you trying to achieve in general with MarkLogic Server? Kind regards, Geert > Drs. G.P.H. Josten Consultant http://www.daidalos.nl/ Daidalos BV Source of Innovation Hoekeindsehof 1-4 2665 JZ Bleiswijk Tel.: +31 (0) 10 850 1200 Fax: +31 (0) 10 850 1199 http://www.daidalos.nl/ KvK 27164984 De informatie - verzonden in of met dit emailbericht - is afkomstig van Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend. > From: [email protected] > [mailto:[email protected]] On Behalf Of > Karl Erisman > Sent: maandag 23 november 2009 3:14 > To: [email protected] > Subject: [MarkLogic Dev General] XML structure/schema design for MLS > > I have a general question about choosing an XML structure > (schema design if using schemas) for use with MarkLogic. My > particular situation involves storing clinical data. There > are multiple opposing forces that could motivate choosing one > schema structure over another. > The main ones are: > > (1) standards compliance: it would be nice if the internal > storage format is compatible with existing standard schemas > for clinical data in XML (to take advantage of existing tools > that work against the standard schemas and to allow exchange > with external systems without requiring transformation) > (2) ease of handling in MLS, specifically *indexing* and *searching* > (3) "clean" XML (structure that makes sense semantically to a > human viewer) > > The more I experiment with cts:query and search:search, the > more I tend to think that #2 should dominate entirely, to the > point of ignoring the others. As it turns out, some standard > data formats are really awkward to work with in MLS. > > So, do others just organize their content specifically for > MLS and run transformations when needed? What does Mark > Logic recommend? What have your experiences been? > > Thank you, > Karl > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
