GWphua commented on issue #16957:
URL: https://github.com/apache/druid/issues/16957#issuecomment-2464033572
Hi @johnImply, I would like your (or the community's) advice on the
following matter:
Currently, we can allow our columns to display data in datetime format by
using `MILLIS_TO_TIMESTAMP`, here's an example of me creating a secondary
timestamp `sec_timestamp` by doing `PARSE_TIME("timestamp") on Druid's provided
example dataset "wikipedia". A subsequent
`MILLIS_TO_TIMESTAMP("sec_timestamp")` will produce the results you probably
wanted.

Upon reviewing the code, I discovered that Druid’s conversion from SQL Type
to Druid Type is non-reversible. This is evident in the `Columns.java` class:
```java
public static final Map<String, ColumnType> SQL_TO_DRUID_TYPES =
new ImmutableMap.Builder<String, ColumnType>()
.put(SQL_TIMESTAMP, ColumnType.LONG)
.put(SQL_BIGINT, ColumnType.LONG)
.put(SQL_FLOAT, ColumnType.FLOAT)
.put(SQL_DOUBLE, ColumnType.DOUBLE)
.put(SQL_VARCHAR, ColumnType.STRING)
.put(SQL_VARCHAR_ARRAY, ColumnType.STRING_ARRAY)
.put(SQL_BIGINT_ARRAY, ColumnType.LONG_ARRAY)
.put(SQL_FLOAT_ARRAY, ColumnType.FLOAT_ARRAY)
.put(SQL_DOUBLE_ARRAY, ColumnType.DOUBLE_ARRAY)
.build();
public static final Map<ColumnType, String> DRUID_TO_SQL_TYPES =
new ImmutableMap.Builder<ColumnType, String>()
.put(ColumnType.LONG, SQL_BIGINT)
.put(ColumnType.FLOAT, FLOAT)
.put(ColumnType.DOUBLE, DOUBLE)
.put(ColumnType.STRING, SQL_VARCHAR)
.put(ColumnType.STRING_ARRAY, SQL_VARCHAR_ARRAY)
.put(ColumnType.LONG_ARRAY, SQL_BIGINT_ARRAY)
.put(ColumnType.FLOAT_ARRAY, SQL_FLOAT_ARRAY)
.put(ColumnType.DOUBLE_ARRAY, SQL_DOUBLE_ARRAY)
.build();
```
As we can see, the `SQL_TIMESTAMP` is converted to `ColumnType.LONG`.
However, when we attempt to retrieve the Druid Type after ingestion, this
conversion creates a limitation, as all `ColumnType.LONG` entries are mapped to
`SQL_BIGINT`. This explains why our timestamps are displayed as SQL BIGINT.
Currently, Druid identifies a timestamp column solely by checking if its
name matches the string `__time`:
```java
public static final String TIME_COLUMN = "__time";
public static String sqlType(ColumnSpec spec)
{
if (isTimeColumn(spec.name())) {
return SQL_TIMESTAMP;
}
ColumnType druidType = druidType(spec);
if (druidType == null) {
return null;
}
String sqlType = DRUID_TO_SQL_TYPES.get(druidType);
return sqlType == null ? druidType.asTypeString() : sqlType;
}
public static boolean isTimeColumn(String name)
{
return TIME_COLUMN.equals(name);
}
```
To enable Druid to recognize secondary timestamps, we would need to store
additional metadata about the column during ingestion. This could involve
either creating a new timestamp `ColumnType` or/and implementing a verification
method within `ColumnCapabilities`.
I feel that these approaches will turn out to be very complicated for
something that already has a workaround. I also have not mentioned that should
I continue to work on this, the possibility of influencing the behaviour of
`timestamp` type, since it is coupled to our primary timestamp `__time`.
I would like to hear your thoughts on this matter. đŸ˜„
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]