This is in the general case quite tricky. First off its a chicken and egg problem ... how do you detect which schema ? If the schema is defined in input then good ... but if not ... you need rules to find it. Next, schema *rarely* gives enough information on how to generate usable documents. You can generate "valid" documents given a schema but usually not "usable" ones, try some tools out like Oxygen and see what they do. The problem is schema is designed to detect invalid input and reject it, not to define what "reasonable useful" input is.
For example a typical markup may allow (p|br |table|image|reference)* a valid example is nothing. But that's not usable. That's an edge case but its real. A tangibly similar problem designing JSON to XML conversions. If you only have the XML schema ... its difficult or impossible to generate the XML you *want* ... from just the data and the schema. You need out of band information (code) ... ( you can see examples of this in our json library http://docs.marklogic.com/json:transform-from-json ) The "custom" configuration is a set of rules attempting to produce "decent" XML from JSON, or visa-versa and it does make use of schema , but only for atomic types ... This paper goes into much more depth on the issues http://www.balisage.net/Proceedings/vol7/html/Lee01/BalisageVol7-Lee01.html Its very tricky ... and the direction taken with the above approach requires annotated schema to give hints, and input documents corresponding very closely to the desired output. Another problem is that even if schema had every bit of information you need , its horrendously difficult to parse and make use of. ML has *some* schema query ability but not in a general way. Its designed to start with an XML node already read in, then you can query its schema structure ... but you can't (easily) start with just a schema and query "what kinds of things go here" ... My opinion ... this direction will seem wonderful at first but will end up a nightmare and a failure. I suggest instead using some kind of out of band information ... like a XSLT or XQuery or other "tempting" kind of technology, hand made from each schema you want to use, and designed for the kind of input you'll be getting. Its very tempting to want it to be done generically without having to manually create the mappings ... but its not only really difficult ... well , its impossible ... Impossible in the sense that what everyone I know (and am assuming you too) *really* want is to produce a specific "nice" version for the output. Not just any conforming output, but one that is structured and maps things the way you want ... When I first started on projects like this I didn't fully appreciate that the problem of "nice" is not only undefinable but when you try to define it, typically self-conflicting. That is I find "This time I want arrays turned into nested elements" but "This other time, I want arrays flattened into a single element" and "Sometimes, if the document has only 1 element make it an attribute" ... I know this very well first hand by trying to achieve the magic goal for many years. You can ask anyone using the ML JSON library (which is really a simplified restricted form of this same problem), what seems "obvious" actually isn't ... Making one desired format work ... tends to break others and its very tedious to dig down and discover why ... Even if you don't care (or can choose to not care) what the exact output looks like, your still going to have a hard time generically mapping input to a document corresponding to a schema ... unless the document structure and all possibilities is very well know in advance. If you know your data precisely, and you can identify what fields would map to what structure precisely, then you can do it. But alas ... that's the problem. If can do that, you don't need schema, you need a transformation mapping tool (xslt,xquery, something) . Schema won't help with the mapping problem at all because it has no information in it about meaning or any kind of cross referencing to input data not already in that schema format. There is one way out ... and its generally not acceptable to most. That is a schema which is extremely free form. You still won't need the schema, but if it look something like <field name="fieldname>data</field> ...... (think CSV) then you can map nearly anything to it automatically ... you just won't get much value from it. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Wanczowski, Andrew Sent: Thursday, June 05, 2014 5:41 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Generating XML from Schemas I am looking to generate documents that are not coming from external sources. For example we would have editors filling out a web forms and then posting them to ML. At that time I would like to detect what type of content they are trying to create and then generate a document using the appropriate schema. -Drew On 6/5/14 4:38 PM, "Michael Blakeley" <[email protected]> wrote: >Can you expand on the need to "generate XML documents"? If you've got >NewsML etc coming in from external sources, what sort of documents are >"generated within the system"? > >-- Mike > >On 5 Jun 2014, at 06:58 , Wanczowski, Andrew ><[email protected]> wrote: > >> Hello All, >> >> I am currently looking to build a metadata store for various >>documents and want them to remain in their native schemas. Content >>will be generated within the system and come from external systems. >>Some examples are the PRISM, NewsML and IPTC/XMP Schemas. >> >> I have been investigating ways to generate XML documents to be stored >>in MarkLogic. The great thing about MarkLogic is that you can have >>multiple schemas or schemaless documents in your database. However, >>this becomes challenging when you want your content to originate in >>MarkLogic or MarkLogic applications to control full CRUD of the >>documents. I am looking for something scalable where we would only >>have to manage one library for all CRUD functions. The current >>approach would be to have a library module for each schema which will >>handle all CRUD and serialization/de-serialization. This becomes a >>maintenance headache. >> >> The desired features would be: >> A single library module to handle document generation Generate a >>document based on an XML Schema Create, Update and Partial Update >>should be supported Values should be populated based on user's input >>from Another XML document or JSON document Input mappings should be >>configurable form both XML and JSON Serialization/de-serialization of >>XML and JSON for API usage or web form usage >> >> ExistDB has a way to generate an instance from an XML Schema. >>Documentation can be found at >>http://en.wikibooks.org/wiki/XQuery/XML_Schema_to_Instance . But this >>does not do all the features desired. >> >> Any input would be extremely helpful! >> >> Thanks >> Drew >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > >_______________________________________________ >General mailing list >[email protected] >http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
