Hi Luke, I would like to emphasize (or simply remind you) of two key features 
of XPath (and XML technology in general). The FIRST one is that treating the 
information in a single document or in a collection of documents or a 
collection of document fragments is identical. So, for example, $data//foo 
works regardless of whether $data is one document, or a collection of 
documents, or a single element extracted from some document, or a collection of 
elements extracted from multiple documents or even from a mixture of documents 
exposed by a database, the file system and REST service responses etc. 
Therefore collecting documents into a single document prior to processing is 
(according to my opinion) somewhat against the grain of what XML technology 
excels in accomplishing. 

The SECOND point is that XPath has been specified with mathematical precision, 
so I cannot imagine being more precise and concise when it comes to defining 
*rules*. (That XPath expressions cannot easily replace a grammar is a different 
matter, of course.) 

And finally - I would not overemphasize the importance of using schematron, as 
equivalent validation functionality is fairly easy to implement just using 
XQuery/XPath: it is the XPath language what is the engine and heartbeat of it 
all, it is a secondary question whether one uses the schematron framework, 
ingenious and handy though it is for typical single document checks.

Cheers, 
Hans

    Am Freitag, 13. Dezember 2019, 07:53:48 MEZ hat ERRINGTON Luke 
<luke.erring...@sydac.com> Folgendes geschrieben:  
 
 Hi Christian,

Thank you for your time in preparing your response and examples. You describe 
the approach that I thought would be necessary if we couldn't get some sort of 
schema validation to work. Unfortunately the specification of the validation 
requirements in XQuery code is not as clean, clear or minimal as might be 
desired.

It would be nice to have some sort of pre-commit hook for validating 
modifications to the database so that we are not restricted to only allowing 
modifications through XQuery. It looks as though this is the point of 
https://github.com/BaseXdb/basex/issues/1082, but it looks as though that is on 
hold, after some significant discussion.

Presumably I could achieve schema validation by having the entire data set 
inside one document, but that would lose the benefits of collections, and 
having the data arranged similar to a file system, so ... I was hoping that I 
could define a Schematron rule something like this (untested, because I'm 
struggling to get Schematron working at the moment - content is not allowed in 
prolog):

<schema>
    <pattern>
        <rule context="mapping">
            <assert test="@object_from_id = //object/@id">Trying to map invalid 
object id</assert>
            <assert test="@object_to_id = //object/@id">Trying to map invalid 
object id</assert>
        </rule>
    </pattern>
</schema>

This is relatively minimal and expressive. It seems to work just by XPath, so 
all I need is //object/@id to find the object IDs present in all documents, not 
just this one. But, when I use //object/@id as a path in BaseX it does just 
that! It returns all of the object IDs, in all of the documents - so maybe this 
schema can be used across all documents at once! That would be fantastic!

Of course, in practice I am not sure if this can be done, and I am pretty new 
to all of this. I see that currently schematron::validate requires a node as an 
input. I presume that db:open() will give me a sequence of document-nodes. What 
I presume would work is if I could turn this sequence into a single 
document-node, somehow. I am not sure if this can be done easily, or 
efficiently, in XQuery, or whether it would be easier to implement it within 
BaseX's implementation of db:open, or whether this is not really feasible at 
all ...

(With that working a similar line of thought would apply to schema validation)

Is there any possibility of getting that working?

Thanks,
Luke

-----Original Message-----
From: Christian Grün <christian.gr...@gmail.com> 
Sent: Thursday, 12 December 2019 9:45 PM
To: ERRINGTON Luke <luke.erring...@sydac.com>
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] BaseX and validating the entire database

Dear Luke,

I completely agree, serious database applications cannot exist without 
integrity and consistency checks. In our own projects, checks are realized in 
XQuery. Depending on the requirements, we choose one of the following 
alternatives:

1. If we need to ensure that every single incoming database entity is correct, 
we apply checks before each update. The resources are also updated via XQuery 
(see [1,2] for more information) if all checks are successful.

2. If we have control over the data that will be added to a database, and if we 
know that it’s correct as long as the application has no bugs, it is sufficient 
to check the database in regular periods (e.g., once every night). This allows 
us to use the full range of APIs for updating the database (although most of 
our applications are fully written in XQuery and RESTXQ [3]).

Some straightforward examples how your checks could look like:

> Is there any way to ensure that when X.xml is added to the database that the 
> object IDs that it is referring to actually exist in the database too?

  let $doc := <mapping object_from_id=”1” object_to_id=”2” />
  let $ids := db:open('your-db')//object/@id/data()
  where not($ids = $doc/@object_from_id and $ids = $doc/@object_to_id)
  return error((), 'Unknown id')

> how can I ensure that when a new object xml file is added that it is not 
> using an ID that already exists?

  let $new-id := '12345'
  where db:open('your-db')//object/@id = $new-id
  return error((), 'Id has already been assigned')

You can organize the highest assigned id in the root node of your database 
document or (if you work with multiple documents per
database) in a dedicated meta document.

Hope this helps
Christian

[1] http://docs.basex.org/wiki/Database_Module
[2] http://docs.basex.org/wiki/XQuery_Update
[3] http://docs.basex.org/wiki/RESTXQ



On Thu, Dec 12, 2019 at 3:08 AM ERRINGTON Luke <luke.erring...@sydac.com> wrote:
>
> Hello,
>
> We are evaluating moving from an RDBMS (Oracle), to BaseX as much of our 
> source data originate in XML files and converting to tables in a relational 
> schema is painful. In general BaseX looks great!
>
> However, one thing that we lose is referential integrity, and the ability to 
> validate data in one XML file that is referring to data in another. Are there 
> any possibilities within BaseX or an additional module that can do this?
>
> For example:
> •            Can we validate using a schema that applies across a collection 
> of documents, rather than just one?
> •            Can we use Schematron (which looks cool) to apply its inteRnal 
> XPaths to the entire collection of documents?
> •            Or both?
> •            Something else?
>
> We could try using XLinks, but that would involve changing our XML 
> data/structure, and my understanding is that BaseX doesn’t support (let alone 
> validate) them, anyway.
>
> A situation I have in mind is something like (very, very simplified):
>
> A.xml
> <object id=”1” name=”One”>
> </object>
>
> B.xml
> <object id=”2” name=”Two”>
> </object>
>
> X.xml
> <mapping object_from_id=”1” object_to_id=”2” />
>
> Is there any way to ensure that when X.xml is added to the database that the 
> object IDs that it is referring to actually exist in the database too?
>
> I would also like to be able to ensure that all of the <object>s in the 
> database have unique id attributes. A schema can do this within a file, but 
> how can I ensure that when a new object xml file is added that it is not 
> using an ID that already exists?
>
> Thanks for any answers,
> Luke
  

Reply via email to