Gavin,

"FORMAT JSON" is an operator, it converts the json string into an internal
`JsonValueContext`[1],
and the `JsonValueContext` holds the deserialized json object which will be
further used by
the following json functions.

"FORMAT JSON" is just one choice of "JSON output clause", which is stated
in the standard as:
<JSON output clause> ::=
  RETURNING <data type>
      [ FORMAT <JSON representation> ]
<JSON representation> ::=
    JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]
  | <implementation-defined JSON representation option>

Other options could be "FORMAT AVRO", "FORMAT BSON" (which is not
implemented yet in calcite).

"JSON output clause" only controls the serialization format for json
function's output,
and "JSON FORMAT" should be the default. Hence the behavior without "FORMAT
JSON"
should be corrected IMHO.

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L101

Gavin Ray <[email protected]> 于2022年3月22日周二 03:48写道:

> Benchao,
>
> I owe you a beer, thank you!!
> Using FORMAT JSON fixes everything -- it just has to be specified on every
> JSON function.
>
> See images below if curious:
>
> Without FORMAT JSON:
> https://i.imgur.com/ekgFXqe.png
>
> With FORMAT JSON:
> https://i.imgur.com/8lHIqUZ.png
>
> This is fantastic news =D
> Now I am just curious how the "FORMAT JSON" modifier works.
>
> On Sun, Mar 20, 2022 at 10:42 PM Gavin Ray <[email protected]> wrote:
>
> > Hot damn, you're right -- that doesn't look like it's double-escaped!
> > I will try to this out ASAP and post an update here, thank you for the
> tip
> > =)
> >
> >
> > On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <[email protected]> wrote:
> >
> >> Hi Gavin,
> >>
> >> Have you ever tried 'FORMAT JSON' like this[1], the tests show that the
> >> result of 'format json' is what you want.
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
> >>
> >>
> >> Gavin Ray <[email protected]> 于2022年3月19日周六 05:19写道:
> >>
> >> > That sounds very reasonable to me
> >> >
> >> > I don't know the Calcite codebase as well as other folks -- and
> >> certainly
> >> > not nearly as well as you
> >> > Where would be the place to put such a thing/the overall approach?
> >> >
> >> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <[email protected]>
> >> > wrote:
> >> >
> >> > > I think you’re proposing making the JSON_ functions smarter at
> >> runtime.
> >> > My
> >> > > general philosophy is to have the smarts at prepare time and make
> the
> >> > > runtime operators dumb. I think that philosophy can be applied here.
> >> Some
> >> > > extra logic would kick in when preparing a query that has JSON_
> >> > functions,
> >> > > and that logic would fold together multiple nested JSON_ functions.
> >> > Perhaps
> >> > > we need to add a new (internal) JSON_ function that can do all of
> the
> >> > steps.
> >> > >
> >> > > Julian
> >> > >
> >> > >
> >> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <[email protected]>
> >> wrote:
> >> > > >
> >> > > > Sorry to beat a dead horse here, but I'm one of those weirdos that
> >> > gets a
> >> > > > lot of use out of Calcite's JSON operators.
> >> > > > Calcite's JSON implementation is broken for queries that have more
> >> than
> >> > > one
> >> > > > depth of JSON object/array calls.
> >> > > >
> >> > > > The reason is because the operator calls "jsonize()", which parses
> >> the
> >> > > > (presumably) JVM object as a JSON string
> >> > > > This works for something like Map<String, String>, but if you have
> >> > > > Map<String, Map<String, String>>, what happens is this:
> >> > > >
> >> > > > JSON_OBJECT(
> >> > > >  foo: 1,
> >> > > >  bar: JSON_OBJECT(
> >> > > >     qux: 2
> >> > > >  )
> >> > > > )
> >> > > >
> >> > > > The parse happens inside-out, so first we get the innermost object
> >> > > parsed,
> >> > > > which gives:
> >> > > >
> >> > > > { "qux": 2 }
> >> > > >
> >> > > > But -- as a string! This is important!
> >> > > > Now, when we parse the next object, we get this:
> >> > > >
> >> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> >> > > >
> >> > > > This is because the object with "qux" isn't an object, but a
> string
> >> > value
> >> > > > Which to be valid JSON, needs to have its quotes and braces
> escaped
> >> > > >
> >> > > > Definitely not what you want, and the value isn't usable =(
> >> > > >
> >> > > > Since there is no state/context/stack (that I can tell) when the
> >> parse
> >> > > > function is called,
> >> > > > how might it be possible to write something to the effect of:
> >> > > >
> >> > > > "Analyze the query, and if the number of JSON operations is
> greater
> >> > than
> >> > > > one,
> >> > > > only call 'jsonize()' on the outer-most parse/object."
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> >> > >
> >> > >
> >> >
> >>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >>
> >
>


-- 

Best,
Benchao Li

Reply via email to