Re: [MarkLogic Dev General] json:config for XML schema (David Lee)

David Lee Fri, 14 Apr 2017 06:36:57 -0700

>>> ".  if you look at the sc:* functions you can parse to get to schema.  And 
>>> then using a few functions to build out the structure you need create a 
>>> function that does the transformation for you. "
>>>I did investigate this approach but it was not feasible in the context of 
>>>the use cases where json:transform-xxx is targeted.
>>> It may well be in individual cases, but doesn?t pan out so well for a 
>>> general-purpose library

Not sure I agree, but would be interested in what area's you felt the sc:
functions are deficient. There are a few issues with some functions, but given
I am not a customer I cant make requests to fix.

[DAL] I am not saying they are 'deficient' , I am saying that because they
require an *already created* document to query,
they are not generally useful as a means of determining how to
construct that document in the first place.

>>>In general-- to do a schema based transformation -- you need schema for
>>>*both sides*
>>>If you don't care about deterministic transformations or bi-directional
>>>transformations, you can make do with less.
In most cases you are trying to convert format not definition such as XML->JSON
or XML(Schema) -> CSV. In XML->JSON there can be general rules and some
assumption of lossiness, but since the ask is XML(Most Complex) ->
JSON(Simpler) this precision can be resolved by making some assumptions on how
to treat mixed or other types. Not a panacea, but dont let perfection be the
enemy of the good.

[DAL:] Yes, rarely do people want to operate on the definiens. Since the
json:transform library was mentioned,, Im revering to the implicit question of
"Why doesn’t it do this for me".
For the case of a library function, far fewer assumptions are safe
to make. Having full schema has the potential of eliminating the need to
'assume' as many things.
Which copromotes are 'acceptable' or is usually something only the
end user, and only in that specific case, can determine.
But yes, one can build systems with 'useful but imperfect'
compromises. And that is what the basic configuration does.
It is when those pre-determined choices -- made by a developer years
before and without your code or data or program to consider, make 'the wrong
compromises'
That is when we have discussions like these -- Im referring more to
the 'Why didn’t it already pick the obvious compromise *I Wanted*" --
as a way to understand the problem, and help lead to the solution.

I wouldn't recommend to anyone anything I didnt already figure out myself.
Your assumption is correct if you ask for a schema without knowledge of the
document its targeting, but considering you have a target document in
mind(which is always the case) this will return the initial schema:

sc:type($my-element-node-matching-schema) ! <x>{.}</x>/node()

[DAL:] I'll up the ante one further. If you already have the source and target
document 'in mind' you don’t even need the sc:* functions.
As I mentioned, the sc: functions are more useful one direction ,
from XML to JSON. Given XML you can introspect it, Having the JSON 'In Mind'
subtitles for the need to have it schema.
However it is not easy to inject 'In Mind' information into a
library, as your examples show, it generally means you have to hand-code the
entire transformation.
Furthermore -- you need knowledge of the set of possible source
document variations upfront, if you want to create a deterministic
transformation.
Simple example: if there is an optional element, and in your
current document it does not exist. You can't query if it *could be there* in
the next one.
That makes the difference between generating an Object, Array, or
atomic value at that point.
Say The next document has that element, then a different
representation may need to be used.
That is the kind decision process that is very difficult (or
impossible) to do without information 'In Mind' informing the entire process.

Result: A case specific, hand crafted transformation will nearly
always do a better job, and take significantly more work, then a general
purpose one.

The compromise I suggest is to pre or post process the data. Convert
the input into a format (using your domain knowledge) one that the library's
built in rules do a good job on,
then take the result and post-process to 'clean it up' into the exact
output you want.
That often allows you to greatly minimize the amount of coding, and
leverage the best of both your own knowledge and existing code.

Given that you have that issue I recommend using 2 32 bit values and using the
appropriate conversion on client. Since you have the XML schema you already
can account for that.

http://stackoverflow.com/questions/209869/what-is-the-accepted-way-to-send-64-bit-values-over-json

[DAL:] That is definitely a valid approach -- but I argue with 'accepted'. It
only works if you are in control of the output format. Even then,
it is not generally a format that the consumers are happy with. I have met
*none* actually that would prefer this format
( e.g. say a { low32 : 123435 , high32 : 3235992 } over a string {
"value" : "1223445593935925" }. The problem only arises in the first place
when the target system cannot directly represent 64 bit integers as 'Numbers'
-- That is only problematic if you need to do numeric operations on them.
Otherwise its much easier to 'pass along' a string value, display it, even do
inequality operations then a structured value. It is also more compact,
more readable and more efficient.

No question, but its not impossible to create a json schema from XML Schema
assuming some loss of precision, the goal is to not naively assume a panacea
(no argument here), but use the knowledge of your XML schema to make general
rules about how to process. Using JSON Schema you can assume its a string with
the notion of a format
[DAL:] Of course. That would bey very useful. To date, I have not yet run
into a customer case that had an XML schema for a JSON document they wanted to
transform.
So my 'gut feeling' would be its more useful to write the json schema directly.
But the more implementations the better. JSON Schema is still very
infrequently used, and when it is, its rarely perfectly conformant with *any*
JSON schema standard.
Hopefully time will improve this

>>Then implement it simply. Produce what is desired, not what is asked for.
>>Make People Happy.

Not trying to solve universal problems, just try to steer one to a path of
happiness or frustration.
[DAL:] A very good goal.

David, given you are an engineer for MarkLogic I realize you are obliged to
give the sanctioned answer from ML,
[DAL:] Not true, my answers are my own.

but you need to consider that not everyone needs the universal nor supported
answer, but know what is possible.
[DAL:] Completely agree. I will up that one notch.. My goal is to educate on
not only the possible, but also how one might
combine 'one off' solutions with generic ones that pre-exist. To that it
helps to understand the problem in general.
Not for just this time, but next. Rarely is a problem asked that does not
re-occur later in a slightly different shape.

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] json:config for XML schema (David Lee)

Reply via email to