Hi Luke, I would like to emphasize (or simply remind you) of two key features of XPath (and XML technology in general). The FIRST one is that treating the information in a single document or in a collection of documents or a collection of document fragments is identical. So, for example, $data//foo works regardless of whether $data is one document, or a collection of documents, or a single element extracted from some document, or a collection of elements extracted from multiple documents or even from a mixture of documents exposed by a database, the file system and REST service responses etc. Therefore collecting documents into a single document prior to processing is (according to my opinion) somewhat against the grain of what XML technology excels in accomplishing.
The SECOND point is that XPath has been specified with mathematical precision, so I cannot imagine being more precise and concise when it comes to defining *rules*. (That XPath expressions cannot easily replace a grammar is a different matter, of course.) And finally - I would not overemphasize the importance of using schematron, as equivalent validation functionality is fairly easy to implement just using XQuery/XPath: it is the XPath language what is the engine and heartbeat of it all, it is a secondary question whether one uses the schematron framework, ingenious and handy though it is for typical single document checks. Cheers, Hans Am Freitag, 13. Dezember 2019, 07:53:48 MEZ hat ERRINGTON Luke <luke.erring...@sydac.com> Folgendes geschrieben: Hi Christian, Thank you for your time in preparing your response and examples. You describe the approach that I thought would be necessary if we couldn't get some sort of schema validation to work. Unfortunately the specification of the validation requirements in XQuery code is not as clean, clear or minimal as might be desired. It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, but it looks as though that is on hold, after some significant discussion. Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog): <schema> <pattern> <rule context="mapping"> <assert test="@object_from_id = //object/@id">Trying to map invalid object id</assert> <assert test="@object_to_id = //object/@id">Trying to map invalid object id</assert> </rule> </pattern> </schema> This is relatively minimal and expressive. It seems to work just by XPath, so all I need is //object/@id to find the object IDs present in all documents, not just this one. But, when I use //object/@id as a path in BaseX it does just that! It returns all of the object IDs, in all of the documents - so maybe this schema can be used across all documents at once! That would be fantastic! Of course, in practice I am not sure if this can be done, and I am pretty new to all of this. I see that currently schematron::validate requires a node as an input. I presume that db:open() will give me a sequence of document-nodes. What I presume would work is if I could turn this sequence into a single document-node, somehow. I am not sure if this can be done easily, or efficiently, in XQuery, or whether it would be easier to implement it within BaseX's implementation of db:open, or whether this is not really feasible at all ... (With that working a similar line of thought would apply to schema validation) Is there any possibility of getting that working? Thanks, Luke -----Original Message----- From: Christian Grün <christian.gr...@gmail.com> Sent: Thursday, 12 December 2019 9:45 PM To: ERRINGTON Luke <luke.erring...@sydac.com> Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BaseX and validating the entire database Dear Luke, I completely agree, serious database applications cannot exist without integrity and consistency checks. In our own projects, checks are realized in XQuery. Depending on the requirements, we choose one of the following alternatives: 1. If we need to ensure that every single incoming database entity is correct, we apply checks before each update. The resources are also updated via XQuery (see [1,2] for more information) if all checks are successful. 2. If we have control over the data that will be added to a database, and if we know that it’s correct as long as the application has no bugs, it is sufficient to check the database in regular periods (e.g., once every night). This allows us to use the full range of APIs for updating the database (although most of our applications are fully written in XQuery and RESTXQ [3]). Some straightforward examples how your checks could look like: > Is there any way to ensure that when X.xml is added to the database that the > object IDs that it is referring to actually exist in the database too? let $doc := <mapping object_from_id=”1” object_to_id=”2” /> let $ids := db:open('your-db')//object/@id/data() where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id) return error((), 'Unknown id') > how can I ensure that when a new object xml file is added that it is not > using an ID that already exists? let $new-id := '12345' where db:open('your-db')//object/@id = $new-id return error((), 'Id has already been assigned') You can organize the highest assigned id in the root node of your database document or (if you work with multiple documents per database) in a dedicated meta document. Hope this helps Christian [1] http://docs.basex.org/wiki/Database_Module [2] http://docs.basex.org/wiki/XQuery_Update [3] http://docs.basex.org/wiki/RESTXQ On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke <luke.erring...@sydac.com> wrote: > > Hello, > > We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our > source data originate in XML files and converting to tables in a relational > schema is painful. In general BaseX looks great! > > However, one thing that we lose is referential integrity, and the ability to > validate data in one XML file that is referring to data in another. Are there > any possibilities within BaseX or an additional module that can do this? > > For example: > • Can we validate using a schema that applies across a collection > of documents, rather than just one? > • Can we use Schematron (which looks cool) to apply its inteRnal > XPaths to the entire collection of documents? > • Or both? > • Something else? > > We could try using XLinks, but that would involve changing our XML > data/structure, and my understanding is that BaseX doesn’t support (let alone > validate) them, anyway. > > A situation I have in mind is something like (very, very simplified): > > A.xml > <object id=”1” name=”One”> > </object> > > B.xml > <object id=”2” name=”Two”> > </object> > > X.xml > <mapping object_from_id=”1” object_to_id=”2” /> > > Is there any way to ensure that when X.xml is added to the database that the > object IDs that it is referring to actually exist in the database too? > > I would also like to be able to ensure that all of the <object>s in the > database have unique id attributes. A schema can do this within a file, but > how can I ensure that when a new object xml file is added that it is not > using an ID that already exists? > > Thanks for any answers, > Luke