Re: nested structs. querying and building metadata in calcite

Julian Hyde Tue, 14 Aug 2018 12:47:16 -0700

Rather than relational vs. non-relational, I think a more interesting split is 
strongly-typed vs. weakly-typed (think java vs. javascript):
* SQL is strongly typed. You know that a result set is a multi-set, and that 
each element is a row, and the names/types of that row’s fields.
* Document stores are often weakly typed. You know that a result set will be a 
collection of JSON elements, but you don’t know what fields they will have.


Can we bring them together without doing too much harm to each? Yes, I believe 
we can, in just the same way that you can access JSON data in strongly-typed 
Java. In Java you write code like this:

void foo(Object json) {
  Map<String, Object> map = (Map) json;
  int orderId = (Integer) map.get(“orderId);
  List<Map<String, Object>> lineItems = map.get(“lineItems”);
  for (Map<String, Object> lineItem: lineItems) {
    String productName = lineItem.get(“productName”);
  }
}

In SQL, we have a MAP type for JSON objects, we have a MULTISET type for JSON 
lists, and an ITEM operator to get elements from MAPs and MULTISETs, and we 
have CAST. All the same tools as Java. (I guess we also need INSTANCEOF.) 

Julian


> On Aug 14, 2018, at 5:56 AM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> Andrei Sereda> I still have one question / clarification regarding
> semantics of select *
> for document databases (with declared schema).
> 
> SQL was conceived for relational data thus the semantics may be ambiguous
> for other kinds of data.
> Probably by now, there are works defining the semantics of SQL queries over
> document stores but I don't have in mind something in particular (or
> something that has been standardized).
> If somebody else has something in mind, I would be also interested in a
> reference.
> 
> Missing vs. null values is not a problem with SQL and relational data but
> rather with other kinds of data such as XML and JSON.
> If you map your documents to relational data (and SQL types) there
> shouldn't be any ambiguity at least at query time.
> 
> 
> 2018-08-14 15:38 GMT+03:00 Stamatis Zampetakis <zabe...@gmail.com>:
> 
>> @Vladimir
>> 
>> I don't have a concrete opinion on the proxy approach but rather some
>> general comments regarding the use-case you provided.
>> 
>> Basically, if I understood well you want to provide a query language for
>> analyzing heap dumps that is more expressive than OQL <http://cr.openjdk.
>> java.net/~sundar/8022483/webrev.01/raw_files/new/src/
>> share/classes/com/sun/tools/hat/resources/oqlhelp.html>.
>> 
>> It seems that the challenge is how to represent Java classes as RDBMS
>> tables and in particular how to handle cases where there are recursive
>> definitions.
>> 
>> Using struct types seems nice in particular for querying but this means
>> that possibly many objects from various different classes are under a
>> single table.
>> In addition, the same object can be stored multiple times under different
>> tables.
>> 
>> class A {
>>  C fA;
>> }
>> 
>> class B {
>> C fB;
>> }
>> 
>> class C {
>> int fC;
>> }
>> 
>> When mapping class A to a relational table the question is what is the
>> data type of column fA and for that, I don't think a struct type C is a
>> good idea.
>> Structs are naturally used for composition while from your example we
>> mostly want an association.
>> 
>> Furthermore, the way you use the DOT notation in queries it mostly means a
>> join rather than field access.
>> Thus I would say that these kind of queries are closer to JPQL and OQL
>> than to pure SQL.
>> 
>> Since you want to use Calcite to analyze heap dumps it may be easier if
>> you stick to a more traditional RDBMS approach.
>> From a quick look at your project, it seems that this is what you are
>> doing so I guess you are not relying on struct types after all.
>> 
>> Thus for the moment, I am not sure if the recursive struct definition is a
>> must-have feature for Calcite.
>> 
>> 
>> 2018-08-14 0:09 GMT+03:00 Michael Mior <mm...@apache.org>:
>> 
>>> Returning a list of mytype seems preferable. As far as missing vs. null
>>> attributes, I think the distinction is commonly not that important and if
>>> it is, there should be some method of accessing the raw document as a
>>> fallback.
>>> 
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>> 
>>> 
>>> 
>>> Le lun. 13 août 2018 à 16:40, Andrei Sereda <and...@sereda.cc> a écrit :
>>> 
>>>> Thanks all for your answers.
>>>> 
>>>> I still have one question / clarification regarding semantics of select
>>> *
>>>> for document databases (with declared schema).
>>>> 
>>>> Say I have the following type definition:
>>>> 
>>>> CREATE TYPE mytype AS (
>>>>    a varchar(2) not null,
>>>>    b varchar(2) NULL // optional (null?)
>>>> );
>>>> 
>>>> If my document has only a present ({ a:value }) what should select *
>>> return
>>>> ? Map with single value ({a:value}) or a pair (value, null)
>>>> 
>>>> In other words should select * return raw document (as generic map) or
>>> list
>>>> of mytype ?
>>>> If later, how to differentiate between missing attribute (field) and
>>>> attribute having null value ?
>>>> 
>>>> On Sat, Aug 11, 2018 at 2:51 PM Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> As I noted in https://issues.apache.org/jira/browse/CALCITE-2464 <
>>>>> https://issues.apache.org/jira/browse/CALCITE-2464>, SQL struct
>>> types do
>>>>> not behave exactly like Java classes (more like Java value types). If
>>> the
>>>>> semantics are not as desired, maybe we’ll have to design a new type
>>>>> constructor.
>>>>> 
>>>>> Since Calcite is grounded in SQL, I would encourage people to give
>>>>> examples in terms of SQL (DDL, queries, results), not just in terms of
>>>> the
>>>>> Java APIs.
>>>>> 
>>>>> Lastly, I’ll draw your attention to Shuyi’s great work on “CREATE
>>> TYPE”
>>>>> (see https://issues.apache.org/jira/browse/CALCITE-2045 <
>>>>> https://issues.apache.org/jira/browse/CALCITE-2045>). He extended
>>> DDL in
>>>>> the “server” module, so you can try out his examples.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Aug 11, 2018, at 9:03 AM, Vladimir Sitnikov <
>>>>> sitnikov.vladi...@gmail.com> wrote:
>>>>>> 
>>>>>> Just to clarify the use case: I'm building SQL plugin to analyze
>>> Java
>>>>> heap
>>>>>> dumps.
>>>>>> https://github.com/vlsi/mat-calcite-plugin
>>>>>> 
>>>>>> select * from "java.lang.String" s  produces a row for each String
>>> in
>>>> the
>>>>>> heap dump.
>>>>>> 
>>>>>> Then might be a case like
>>>>>> select u.path from  "java.net.URL" u;
>>>>>> That is java.net.URL has "path" field which is of java.lang.String.
>>>>>> 
>>>>>> Of course Java classes can produce recursive types, so Node { Node
>>>> next;
>>>>> }
>>>>>> bothered me.
>>>>>> 
>>>>>> The relevant issue is
>>>> https://issues.apache.org/jira/browse/CALCITE-207
>>>>>> 
>>>>>> I have asked once if RelDataTypeProxy is welcome in Calcite (
>>>>>> 
>>>>> 
>>>> https://issues.apache.org/jira/browse/CALCITE-207?focusedCom
>>> mentId=14035245&page=com.atlassian.jira.plugin.system.
>>> issuetabpanels:comment-tabpanel#comment-14035245
>>>>>> ), however it looks like I have to implement it and see what breaks.
>>>>>> 
>>>>>> The idea there was to use RelDataTypeProxy("Node") as a type for the
>>>>> "next"
>>>>>> field. That should avoid "stackoverflow" on cyclic dependencies in
>>>> types.
>>>>>> I would love to know your opinion on that if you happen to have one.
>>>>>> 
>>>>>> It's great that you update executor to support nested structs.
>>>>>> 
>>>>>> PS. I've not had a chance to review it.
>>>>>> 
>>>>>> Vladimir
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>>

Re: nested structs. querying and building metadata in calcite

Reply via email to