[GitHub] [druid] paul-rogers commented on a diff in pull request #12647: Foundation for the Druid metadata catalog

GitBox Tue, 12 Jul 2022 15:20:13 -0700


paul-rogers commented on code in PR #12647:
URL: https://github.com/apache/druid/pull/12647#discussion_r919471739



##########
server/src/main/java/org/apache/druid/catalog/DatasourceColumnSpec.java:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.catalog;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonSubTypes;
+import com.fasterxml.jackson.annotation.JsonSubTypes.Type;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.StringUtils;
+
+/**
+ * Description of a detail datasource column and a rollup
+ * dimension column.
+ */
+@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
+@JsonSubTypes(value = {
+    @Type(name = "detail", value = 
DatasourceColumnSpec.DetailColumnSpec.class),
+    @Type(name = "dimension", value = 
DatasourceColumnSpec.DimensionSpec.class),
+    @Type(name = "measure", value = DatasourceColumnSpec.MeasureSpec.class),
+})
+public abstract class DatasourceColumnSpec extends ColumnSpec
+{
+  private static final String TIME_COLUMN = "__time";
+
+  @JsonCreator
+  public DatasourceColumnSpec(
+      @JsonProperty("name") String name,
+      @JsonProperty("sqlType") String sqlType
+  )
+  {
+    super(name, sqlType);
+  }
+
+  @Override
+  public void validate()
+  {
+    super.validate();
+    if (sqlType == null) {
+      return;
+    }
+    if (TIME_COLUMN.equals(name)) {
+      if (!"TIMESTAMP".equalsIgnoreCase(sqlType)) {
+        throw new IAE("__time column must have type TIMESTAMP");
+      }
+    } else if (!VALID_SQL_TYPES.containsKey(StringUtils.toUpperCase(sqlType))) 
{

Review Comment:
   @clintropolis, thanks for the note. The catalog work is not yet to this 
level of detail, and I look forward to discussing the specifics when we get 
there. (Which will be soon; for now, the work is still wrestling with the input 
table side of things.)
   
   We can, however, think about some constraints. One is on the planner side: 
Calcite (and SQL) know only two facts about any column: its name and type. 
Anything we want to say about the column (other than name) has to be embedded 
into the Calcite type. This is not much of a restriction, however: Calcite is 
pretty liberal about what types can do. In Drill, for example, we have types 
for things such as maps, dictionaries, variants, and n-dimensional arrays of 
any type. In the SQL world, some databases have an `INTERVAL` type with ranges: 
`INTERVAL MINUTE TO DAY`, say. 
   
   So, to the degree that any thought has gone into the type system claim, it 
is just that we need a way to express aggregates (and dimensions) via types. In 
fact, we must already do so for the new `INSERT` functionality: it is on the 
"to do" list to review what we did there and reuse it.
   
   Then, we need a way to express that type syntactically. The syntax noted 
above is one way. Gian suggests a more traditional spelling out of the details: 
`sumX BIGINT AGGREGATE USING SUM`, say. The syntax can be anything we want, as 
long as we can coax Calcite to parse it into our internal representation.
   
   That syntax is what appears for the column "type" in the declaration in the 
catalog. Now, in the catalog, we *could* break the type into parts: the storage 
type (`BIGINT` in the `SUM` example), the input type `also `BIGINT`, even the 
initialization, aggregation, and reduction functions, if we prefer that the 
user spells those out for each use.
   
   So, for now, let's assume that "type" means whatever we need to specify, and 
that the type information has to be convertible to a Calcite type.
   
   To ingest a pre-build sketch, I assume that there is some text format? Maybe 
a JSON string or a Base64 encoded binary value? If so, then we'd need a 
function that converts from that string to an internal sketch object. That 
function can be explicit: `CONVERT_TO_SKETCH(str)`, part of SQL `CAST(str AS 
SKETCH)` or implicit: the only possible conversion from a string to the sketch, 
if the sketch otherwise works with numeric types. Presumably we have such 
functions already? If so, we can just use them.
   
   I wonder, do we have a list somewhere of the many aggregate, dimension, and 
sketch types we support? That list can never be complete, I assume, since users 
can write their own extensions. But, would at least help me understand what we 
have today.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on a diff in pull request #12647: Foundation for the Druid metadata catalog

Reply via email to