Re: [I] Timestamp field datatype (druid)

via GitHub Fri, 08 Nov 2024 00:09:03 -0800


GWphua commented on issue #16957:
URL: https://github.com/apache/druid/issues/16957#issuecomment-2464033572


   Hi @johnImply, I would like your (or the community's) advice on the 
following matter:
   
   Currently, we can allow our columns to display data in datetime format by 
using `MILLIS_TO_TIMESTAMP`, here's an example of me creating a secondary 
timestamp `sec_timestamp` by doing `PARSE_TIME("timestamp") on Druid's provided 
example dataset "wikipedia". A subsequent 
`MILLIS_TO_TIMESTAMP("sec_timestamp")` will produce the results you probably 
wanted.
   
   
![image-20241107154756080](https://github.com/user-attachments/assets/a55da441-3810-4196-8d6f-c6c3b7cd65cd)
   
   Upon reviewing the code, I discovered that Druid’s conversion from SQL Type 
to Druid Type is non-reversible. This is evident in the `Columns.java` class:
   
   ```java
   public static final Map<String, ColumnType> SQL_TO_DRUID_TYPES =
         new ImmutableMap.Builder<String, ColumnType>()
           .put(SQL_TIMESTAMP, ColumnType.LONG)
           .put(SQL_BIGINT, ColumnType.LONG)
           .put(SQL_FLOAT, ColumnType.FLOAT)
           .put(SQL_DOUBLE, ColumnType.DOUBLE)
           .put(SQL_VARCHAR, ColumnType.STRING)
           .put(SQL_VARCHAR_ARRAY, ColumnType.STRING_ARRAY)
           .put(SQL_BIGINT_ARRAY, ColumnType.LONG_ARRAY)
           .put(SQL_FLOAT_ARRAY, ColumnType.FLOAT_ARRAY)
           .put(SQL_DOUBLE_ARRAY, ColumnType.DOUBLE_ARRAY)
           .build();
   
     public static final Map<ColumnType, String> DRUID_TO_SQL_TYPES =
         new ImmutableMap.Builder<ColumnType, String>()
         .put(ColumnType.LONG, SQL_BIGINT)
         .put(ColumnType.FLOAT, FLOAT)
         .put(ColumnType.DOUBLE, DOUBLE)
         .put(ColumnType.STRING, SQL_VARCHAR)
         .put(ColumnType.STRING_ARRAY, SQL_VARCHAR_ARRAY)
         .put(ColumnType.LONG_ARRAY, SQL_BIGINT_ARRAY)
         .put(ColumnType.FLOAT_ARRAY, SQL_FLOAT_ARRAY)
         .put(ColumnType.DOUBLE_ARRAY, SQL_DOUBLE_ARRAY)
         .build();
   ```
   
   As we can see, the `SQL_TIMESTAMP` is converted to `ColumnType.LONG`. 
However, when we attempt to retrieve the Druid Type after ingestion, this 
conversion creates a limitation, as all `ColumnType.LONG` entries are mapped to 
`SQL_BIGINT`. This explains why our timestamps are displayed as SQL BIGINT.
   
   Currently, Druid identifies a timestamp column solely by checking if its 
name matches the string `__time`:
   
   ```java
     public static final String TIME_COLUMN = "__time";
   
     public static String sqlType(ColumnSpec spec)
     {
       if (isTimeColumn(spec.name())) {
         return SQL_TIMESTAMP;
       }
       ColumnType druidType = druidType(spec);
       if (druidType == null) {
         return null;
       }
       String sqlType = DRUID_TO_SQL_TYPES.get(druidType);
       return sqlType == null ? druidType.asTypeString() : sqlType;
     }
   
     public static boolean isTimeColumn(String name)
     {
       return TIME_COLUMN.equals(name);
     }
   ```
   
   To enable Druid to recognize secondary timestamps, we would need to store 
additional metadata about the column during ingestion. This could involve 
either creating a new timestamp `ColumnType` or/and implementing a verification 
method within `ColumnCapabilities`.
   
   I feel that these approaches will turn out to be very complicated for 
something that already has a workaround. I also have not mentioned that should 
I continue to work on this, the possibility of influencing the behaviour of 
`timestamp` type, since it is coupled to our primary timestamp `__time`. 
   
   I would like to hear your thoughts on this matter. 😄


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Timestamp field datatype (druid)

Reply via email to