paul-rogers commented on code in PR #12647: URL: https://github.com/apache/druid/pull/12647#discussion_r919471739
########## server/src/main/java/org/apache/druid/catalog/DatasourceColumnSpec.java: ########## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.catalog; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonSubTypes.Type; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.StringUtils; + +/** + * Description of a detail datasource column and a rollup + * dimension column. + */ +@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") +@JsonSubTypes(value = { + @Type(name = "detail", value = DatasourceColumnSpec.DetailColumnSpec.class), + @Type(name = "dimension", value = DatasourceColumnSpec.DimensionSpec.class), + @Type(name = "measure", value = DatasourceColumnSpec.MeasureSpec.class), +}) +public abstract class DatasourceColumnSpec extends ColumnSpec +{ + private static final String TIME_COLUMN = "__time"; + + @JsonCreator + public DatasourceColumnSpec( + @JsonProperty("name") String name, + @JsonProperty("sqlType") String sqlType + ) + { + super(name, sqlType); + } + + @Override + public void validate() + { + super.validate(); + if (sqlType == null) { + return; + } + if (TIME_COLUMN.equals(name)) { + if (!"TIMESTAMP".equalsIgnoreCase(sqlType)) { + throw new IAE("__time column must have type TIMESTAMP"); + } + } else if (!VALID_SQL_TYPES.containsKey(StringUtils.toUpperCase(sqlType))) { Review Comment: @clintropolis, thanks for the note. The catalog work is not yet to this level of detail, and I look forward to discussing the specifics when we get there. (Which will be soon; for now, the work is still wrestling with the input table side of things.) We can, however, think about some constraints. One is on the planner side: Calcite (and SQL) know only two facts about any column: its name and type. Anything we want to say about the column (other than name) has to be embedded into the Calcite type. This is not much of a restriction, however: Calcite is pretty liberal about what types can do. In Drill, for example, we have types for things such as maps, dictionaries, variants, and n-dimensional arrays of any type. In the SQL world, some databases have an `INTERVAL` type with ranges: `INTERVAL MINUTE TO DAY`, say. So, to the degree that any thought has gone into the type system claim, it is just that we need a way to express aggregates (and dimensions) via types. In fact, we must already do so for the new `INSERT` functionality: it is on the "to do" list to review what we did there and reuse it. Then, we need a way to express that type syntactically. The syntax noted above is one way. Gian suggests a more traditional spelling out of the details: `sumX BIGINT AGGREGATE USING SUM`, say. The syntax can be anything we want, as long as we can coax Calcite to parse it into our internal representation. That syntax is what appears for the column "type" in the declaration in the catalog. Now, in the catalog, we *could* break the type into parts: the storage type (`BIGINT` in the `SUM` example), the input type `also `BIGINT`, even the initialization, aggregation, and reduction functions, if we prefer that the user spells those out for each use. So, for now, let's assume that "type" means whatever we need to specify, and that the type information has to be convertible to a Calcite type. To ingest a pre-build sketch, I assume that there is some text format? Maybe a JSON string or a Base64 encoded binary value? If so, then we'd need a function that converts from that string to an internal sketch object. That function can be explicit: `CONVERT_TO_SKETCH(str)`, part of SQL `CAST(str AS SKETCH)` or implicit: the only possible conversion from a string to the sketch, if the sketch otherwise works with numeric types. Presumably we have such functions already? If so, we can just use them. I wonder, do we have a list somewhere of the many aggregate, dimension, and sketch types we support? That list can never be complete, I assume, since users can write their own extensions. But, would at least help me understand what we have today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
