npawar commented on a change in pull request #5618:
URL: https://github.com/apache/incubator-pinot/pull/5618#discussion_r445915515



##########
File path: 
thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/auto/onboard/AutoOnboardPinotMetadataSource.java
##########
@@ -200,23 +201,26 @@ private void addNewDataset(String dataset, Schema schema, 
Map<String, String> cu
   /**
    * Refreshes an existing dataset in the thirdeye database
    * with any dimension/metric changes from pinot schema
-   * @param dataset
-   * @param schema
-   * @param datasetConfig
    */
-  private void refreshOldDataset(String dataset, Schema schema, Map<String, 
String> customConfigs,
-      DatasetConfigDTO datasetConfig) throws Exception {
-    checkDimensionChanges(dataset, datasetConfig, schema);
+  private void refreshOldDataset(String dataset, Schema schema, String 
timeColumnName,
+      Map<String, String> customConfigs, DatasetConfigDTO datasetConfig) {
+    checkDimensionChanges(dataset, datasetConfig, schema, timeColumnName);
     checkMetricChanges(dataset, datasetConfig, schema);
-    checkTimeFieldChanges(datasetConfig, schema);
+    checkTimeFieldChanges(datasetConfig, schema, timeColumnName);
     appendNewCustomConfigs(datasetConfig, customConfigs);
     checkNonAdditive(datasetConfig);
     datasetConfig.setActive(true);
   }
 
-  private void checkDimensionChanges(String dataset, DatasetConfigDTO 
datasetConfig, Schema schema) {
+  private void checkDimensionChanges(String dataset, DatasetConfigDTO 
datasetConfig, Schema schema,
+      String timeColumnName) {
     LOG.info("Checking for dimensions changes in {}", dataset);
-    List<String> schemaDimensions = schema.getDimensionNames();
+    List<String> schemaDimensions = new 
ArrayList<>(schema.getDimensionNames());
+     for (String dateTimeColumn : schema.getDateTimeNames()) { // treat all 
dateTimeFields specs as dimensions, except the primary time column

Review comment:
       up until now, Pinot had 3 types of columns - **DIMENSION**, **METRIC**, 
**TIME**. DIMENSION are a list, METRIC are a list, but TIME is only 1. 
   For example:
   ```
   {
     "schemaName": "flights",
     "dimensionFieldSpecs": [ {
         "name": "flightNumber",
         "dataType": "LONG"
       },
       {
         "name": "city",
         "dataType": "STRING"
       } ],
     "metricFieldSpecs": [ {
         "name": "price",
         "dataType": "DOUBLE"
       } ],
     "timeFieldSpec": {
         "incomingGranularitySpec": {
            "name": "millisSinceEpoch",
            "dataType": "LONG",
             "timeUnit": "MILLSECONDS"
         }
      }
   }
   ```
   I see in TE, dimensionFieldSpec get stored in `DatasetConfig`, 
metricFieldSpec in the `MetricsConfig`, and timeFieldSpec  also in the 
`DatasetConfig`. 
   
   
   Starting release 0.4.0, we have a new type of column **DATETIME**. This is 
set to replace **TIME** soon. **DATETIME** is a list. This means now there can 
be more than 1 time column. 
   For example:
   ```
   {
     "schemaName": "flights",
     "dimensionFieldSpecs": [ {
         "name": "flightNumber",
         "dataType": "LONG"
       },
       {
         "name": "city",
         "dataType": "STRING"
       } ],
     "metricFieldSpecs": [ {
         "name": "price",
         "dataType": "DOUBLE"
       } ],
     "dateTimeFieldSpecs": [{
         "name": "millisSinceEpoch",
         "dataType": "LONG",
         "format": "1:MILLSECONDS:EPOCH",
         "granularity": "15:MINUTES"
       },
       {
         "name": "arrivalTime",
         "dataType": "LONG",
         "format": "1:MILLISECONDS:EPOCH",
         "granularity": "1:MILLISECONDS"
       },
       {
         "name": "departureTime,
         "dataType": "LONG",
         "format": "1:MILLISECONDS:EPOCH",
         "granularity": "1:MILLISECONDS"
       }]
   }
   ```
   In TE, the `DatasetConfig` has place only for a single time column. So for 
the remaining **DATETIME** from the list, we have following options:
   1. ignore them - but this won't be correct for users
   2. add them as dimensions - this is probably the only suitable place for 
these columns as of now
   3. add some new construct in TE to keep the multiple time columns.
   
   So this would actually be a question for the TE team, how do you think 
should the new columns be stored?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to