Re: Proposals for syntax to unroll an inner struct/map?

Jinfeng Ni Mon, 05 Oct 2015 14:44:49 -0700

This proposal makes sense to me. It's quite consistent to the current
behavior when dealing with the top level * column. Also, I like the
idea that we start with a restricted case.


For v1 syntax:

select t.id, t.b.* from t;

Given the input
{
  id: 123,
  b: {
    x:10,
    y:20
  }
}

I'm wondering if we should get

{
  id: 123,
  b.x: 10,
  b.y:20
}

In stead of
{
  id: 123,
  x: 10,
  y:20
}

For t.* as the top level * case, t is either a table name or table
alias, and in the output field, we do not keep 't'. However, in the
case of t.b.*, I feel 'b' is different from 't', since 'b' is field
name in the record; it would be better to keep 'b' in the output field
after expanding t.b.*.

This is also useful to resolve naming conflicts, if we have

select t.b.*, t.c.* from t;

where b and c both have "x" and "y".



On Sun, Oct 4, 2015 at 3:07 PM, Jacques Nadeau <[email protected]> wrote:
> I agree with Julian about use cases.
>
> As I see it, we have one pressing issue: right now there is no way to move
> a map upward without have to manually select all the fields. A large number
> of users I've spoken to have found this frustrating and limiting (and not
> possible depending on the dynamic nature of their data). I think we should
> solve this now. In general, this thread all seem to agree that the 'select
> t.b.* from t' makes sense for that use case. We can limit this behavior as
> follows: the selection can include only one star and it must be the
> termination of selection string. That means the following are allowed:
>
> select * from t (already supported)
> select t.*  from t (already supported)
> select t.a.* from t
> select t.a.* as x from t (the x would simply be ignored)
> select t.a.b.c.d.e.*  from t
>
> however, the following would not be supported:
> select t.*.a
> select t.b.*.*
> select t.b.*.x.*
>
> Complicated pattern matching is something that hasn't been an issue for
> most customers I've spoken to and should be shelved until we have a solid
> set of requirements. In the meantime, custom UDFs can solve this for
> customers. Once we have a bunch of UDFs floating around for these and their
> supporting use cases, we can come up with a more consolidated approach.
>
> As far as the JSON_TABLE stuff is concerned, I'm not inclined to adopt any
> of it since it seems far too complex for most users. (It seems like it is
> trying to fit json into a sql paradigm rather than shape sql to support the
> flexibility and expressive of json.)
>
> How does that sound as a first step?
>
>
>
> On Sun, Oct 4, 2015 at 5:47 PM, Julian Hyde <[email protected]> wrote:
>
>> For expanding one level, I like the first syntax, "select t.id, t.b.*
>> from t;".
>>
>> Why? It is compatible with SQL (in particular how SQL assigns default
>> aliases to expressions), and is consistent how shell globbing works.
>> The effect is as if the user wrote 'select t.id, t.b.x, t.b.y from t'.
>>
>> For multiple levels, I think it becomes a different problem. Remember
>> the analogy with filename globbing, and in particular, how you
>> selectively delete files in a directory tree. Windows has a recursive
>> delete command that does something akin to globbing inside the command
>> (Windows shell does not do globbing). Unix globbing doesn't work
>> recursively, so you use the 'find' command, which let's you control
>> your search based on depth and file attributes as well as file name.
>>
>> If you are expanding deep json documents, I think you need the same
>> amount of control. The documents might contain multiple fields of the
>> same name, especially if the document has repeating elements, and
>> sometimes you might want to return objects and arrays from deep in the
>> tree, not just leaves.
>>
>> So, for recursive expansion, I think we would need a UDF. It's worth
>> looking at JSON_TABLE[1] in Oracle and the SQL standard.
>>
>> For two-level expansion, people might be happy to write
>>
>> select t.id, t.*, t.*.* from t
>>
>> or even, as Ted suggests,
>>
>> select t.id, t.*, t.(x,y).* from t
>>
>> (note the similarity with unix file globbing syntax {x,y}).
>>
>> Let's find a couple of motivating use cases for this before we commit
>> to a syntax. I bet they will pose issues that do not occur in the toy
>> examples.
>>
>> Julian
>>
>> [1]
>> https://docs.oracle.com/cloud/latest/db121/SQLRF/functions091.htm#SQLRF56973
>>
>> On Sun, Oct 4, 2015 at 12:26 PM, Ted Dunning <[email protected]>
>> wrote:
>> >
>> > I would expect full recursion.  Anything short of that seems highly
>> inconsistent.
>> >
>> > Sent from my iPhone
>> >
>> >> On Oct 4, 2015, at 14:27, Aman Sinha <[email protected]> wrote:
>> >>
>> >> The v1 syntax 'select t.id, t.b.* from t;' seems reasonable to me for
>> the
>> >> first level.  The alias can be ignored for the star case.  This is what
>> we
>> >> do (and consistent with Postgres behavior) for queries such as  'select
>> r.*
>> >> as x from region r;'.
>> >>
>> >> For multiple levels, would there be a requirement to expand recursively
>> to
>> >> more levels within b ? or is that not a valid requirement ? For example:
>> >>    b: {
>> >>      c: { x:10,
>> >>            y:20
>> >>          },
>> >>      d: { p: 5,
>> >>            q: 15
>> >>          }
>> >>  }
>> >>
>> >> Is b.* expected to expand only to c and d levels or further ?
>>

Re: Proposals for syntax to unroll an inner struct/map?

Reply via email to