gianm commented on code in PR #12599:
URL: https://github.com/apache/druid/pull/12599#discussion_r940711670


##########
core/src/main/java/org/apache/druid/metadata/MetadataStorageConnector.java:
##########
@@ -88,4 +88,22 @@ default void exportTable(
   void createSupervisorsTable();
 
   void deleteAllRecords(String tableName);
+
+  /**
+   * Upgrade Compatability Method.
+   *
+   * A new column, used_flag_last_updated, is added to druid_segmens table. 
This method alters the table to add the column to make

Review Comment:
   `druid_segments` (spelling)



##########
core/src/main/java/org/apache/druid/metadata/MetadataStorageConnector.java:
##########
@@ -88,4 +88,22 @@ default void exportTable(
   void createSupervisorsTable();
 
   void deleteAllRecords(String tableName);
+
+  /**
+   * Upgrade Compatability Method.

Review Comment:
   compatibility (spelling)



##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentPublisher.java:
##########
@@ -73,7 +73,8 @@ public void publishSegment(final DataSegment segment) throws 
IOException
         (segment.getShardSpec() instanceof NoneShardSpec) ? false : true,
         segment.getVersion(),
         true,
-        jsonMapper.writeValueAsBytes(segment)
+        jsonMapper.writeValueAsBytes(segment),
+        DateTimes.nowUtc().toString()

Review Comment:
   Would be tidier to call `DateTimes.nowUtc()` just once.



##########
docs/configuration/index.md:
##########
@@ -816,6 +816,7 @@ These Coordinator static configurations can be defined in 
the `coordinator/runti
 |`druid.coordinator.kill.period`|How often to send kill tasks to the indexing 
service. Value must be greater than `druid.coordinator.period.indexingPeriod`. 
Only applies if kill is turned on.|P1D (1 Day)|
 |`druid.coordinator.kill.durationToRetain`|Only applies if you set 
`druid.coordinator.kill.on` to `true`. This value is ignored if 
`druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations 
must be a ISO8601 period. Druid will not kill unused segments whose interval 
end date is beyond `now - durationToRetain`. `durationToRetain` can be a 
negative ISO8601 period, which would result in `now - durationToRetain` to be 
in the future.<br /><br />Note that the `durationToRetain` parameter applies to 
the segment interval, not the time that the segment was last marked unused. For 
example, if `durationToRetain` is set to `P90D`, then a segment for a time 
chunk 90 days in the past is eligible for permanent deletion immediately after 
being marked unused.|`P90D`|
 |`druid.coordinator.kill.ignoreDurationToRetain`|A way to override 
`druid.coordinator.kill.durationToRetain` and tell the coordinator that you do 
not care about the end date of unused segment intervals when it comes to 
killing them. If true, the coordinator considers all unused segments as 
eligible to be killed.|false|
+|`druid.coordinator.kill.bufferPeriod`|The amount of time that a segment must 
be unused before it is able to be permanently removed from metadata and deep 
storage. This can serve as a buffer period to prevent data loss if data ends up 
being needed after being marked unused.|`PT24H`|

Review Comment:
   24h seems aggressive as a default. Personally I'd be more comfortable with 
30 days, if it's a cluster I'm operating.



##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -464,6 +501,52 @@ tableName, getSerialType(), getPayloadType()
     );
   }
 
+  /**
+   * Adds the used_flag_last_updated column to the Druid segment table.
+   *
+   * This is public due to allow the UpdateTables cli tool to use for upgrade 
prep.
+   */
+  @Override
+  public void alterSegmentTableAddUsedFlagLastUpdated()
+  {
+    String tableName = tablesConfigSupplier.get().getSegmentsTable();
+    if (!tableHasColumn(tableName, "used_flag_last_updated")) {

Review Comment:
   So far, the only ALTERs we have are to add columns. Definitely sounds 
reasonable to specialize the general method in that direction. It'd also help 
with the log messages, since the method would become aware of what column wants 
to be added.



##########
services/src/main/java/org/apache/druid/cli/UpdateTables.java:
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.cli;
+
+import com.github.rvesse.airline.annotations.Command;
+import com.github.rvesse.airline.annotations.Option;
+import com.github.rvesse.airline.annotations.restrictions.Required;
+import com.google.common.collect.ImmutableList;
+import com.google.inject.Injector;
+import com.google.inject.Key;
+import com.google.inject.Module;
+import org.apache.druid.guice.DruidProcessingModule;
+import org.apache.druid.guice.JsonConfigProvider;
+import org.apache.druid.guice.QueryRunnerFactoryModule;
+import org.apache.druid.guice.QueryableModule;
+import org.apache.druid.guice.annotations.Self;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.metadata.MetadataStorageConnector;
+import org.apache.druid.metadata.MetadataStorageConnectorConfig;
+import org.apache.druid.metadata.MetadataStorageTablesConfig;
+import org.apache.druid.server.DruidNode;
+
+import java.util.List;
+
+@Command(
+    name = "metadata-update",
+    description = "Controlled update of metadta storage"

Review Comment:
   metadata (spelling)



##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -212,6 +213,41 @@ public Void withHandle(Handle handle)
     }
   }
 
+  /**
+   * Execute the desired ALTER statement on the desired table
+   *
+   * @param tableName The name of the table being altered
+   * @param sql ALTER statment to be executed
+   */
+  private void alterTable(final String tableName, final Iterable<String> sql)

Review Comment:
   Refactor `alterEntryTable` to use this?
   
   I like that `alterEntryTable` has nice log messages that mention the 
specific update that is being made, in natural language, like "Adding column: 
type to table[tasks]". Would be good to combine the nice log messages of that 
method with the genericness of this method.



##########
server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java:
##########
@@ -993,7 +1000,7 @@ public List<Interval> inTransaction(Handle handle, 
TransactionStatus status)
                 .createQuery(
                     StringUtils.format(
                         "SELECT start, %2$send%2$s FROM %1$s WHERE dataSource 
= :dataSource AND "
-                        + "%2$send%2$s <= :end AND used = false ORDER BY 
start, %2$send%2$s",
+                        + "%2$send%2$s <= :end AND used = false AND 
(used_flag_last_updated IS NULL or used_flag_last_updated <= 
:used_flag_last_updated) ORDER BY start, %2$send%2$s",

Review Comment:
   I think we can get somewhat better behavior in the NULL case by using the 
created_date:
   
   ```
   used = false AND ((used_flag_last_updated IS NULL AND created_date <= 
:used_flag_last_updated) OR used_flag_last_updated <= :used_flag_last_updated)
   ```
   
   A little closer to the behavior we'd get if the `used_flag_last_updated` was 
actually populated.



##########
server/src/main/java/org/apache/druid/metadata/SQLMetadataConnector.java:
##########
@@ -824,4 +911,59 @@ public Void withHandle(Handle handle)
       log.warn(e, "Exception while deleting records from table");
     }
   }
+
+  /**
+   * Interrogate table metadata and return true or false depending on the 
existance of the indicated column
+   *
+   * public visibility because DerbyConnector needs to override thanks to 
uppercase table and column names invalidating
+   * this implementation.
+   *
+   * @param tableName The table being interrogated
+   * @param columnName The column being looked for
+   * @return boolean indicating the existence of the column in question
+   */
+  public boolean tableHasColumn(String tableName, String columnName)
+  {
+    return getDBI().withHandle(
+        new HandleCallback<Boolean>()
+        {
+          @Override
+          public Boolean withHandle(Handle handle)
+          {
+            try {
+              if (tableExists(handle, tableName)) {
+                DatabaseMetaData dbMetaData = 
handle.getConnection().getMetaData();
+                ResultSet columns = dbMetaData.getColumns(
+                    null,
+                    null,
+                    tableName,
+                    columnName
+                );
+                return columns.next();
+              } else {
+                return false;
+              }
+            }
+            catch (SQLException e) {
+              return false;
+            }
+          }
+        }
+    );
+  }
+
+  /**
+   * Ensure that the segment table has the proper schema required to run Druid 
properly.
+   *
+   * Throws RuntimeException if the column does not exist. There is no 
recovering from an invalid schema,
+   * the program should crash.
+   */
+  private void validateSegmentTable()
+  {
+    if (tableHasColumn(tablesConfigSupplier.get().getSegmentsTable(), 
"used_flag_last_updated")) {
+      return;
+    } else {
+      throw new RuntimeException("Invalid Segment Table Schema! No 
used_flag_last_updated column!");

Review Comment:
   Would be good to link to the docs here about how to do the migration 
manually. We can write the link out even though it won't be valid until after 
the release is made: 
https://druid.apache.org/docs/latest/operations/upgrade-prep.html



##########
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java:
##########
@@ -1024,7 +1024,8 @@ private Set<DataSegment> announceHistoricalSegmentBatch(
               .bind("partitioned", (segment.getShardSpec() instanceof 
NoneShardSpec) ? false : true)
               .bind("version", segment.getVersion())
               .bind("used", usedSegments.contains(segment))
-              .bind("payload", jsonMapper.writeValueAsBytes(segment));
+              .bind("payload", jsonMapper.writeValueAsBytes(segment))
+              .bind("used_flag_last_updated", DateTimes.nowUtc().toString());

Review Comment:
   For niceness, it'd be good to have this `DateTimes.nowUtc()` grabbed once so 
it's always going to be the same for `created_date` and 
`used_flag_last_updated`. Doesn't really matter, but it's tidier.



##########
core/src/main/java/org/apache/druid/metadata/MetadataStorageConnector.java:
##########
@@ -88,4 +88,22 @@ default void exportTable(
   void createSupervisorsTable();
 
   void deleteAllRecords(String tableName);
+
+  /**
+   * Upgrade Compatability Method.
+   *
+   * A new column, used_flag_last_updated, is added to druid_segmens table. 
This method alters the table to add the column to make
+   * a cluster's metastore tables compatible with the updated Druid codebase 
in 0.24.x+
+   */
+  void alterSegmentTableAddUsedFlagLastUpdated();
+
+  /**
+   * Upgrade Compatability Method.

Review Comment:
   compatibility (spelling)



##########
docs/operations/upgrade-prep.md:
##########
@@ -0,0 +1,114 @@
+---
+id: upgrade-prep
+title: "Upgrade Prep"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+  
+## Upgrade to `0.24+` from `0.23` and earlier
+
+### Altering segments table
+
+**If you have set `druid.metadata.storage.connector.createTables` to `true` 
(which is the default), and your metadata connect user has DDL privileges, you 
can disregard this section. You are urged to still evaluate the optional 
section below**
+
+**The coordinator and overlord services will fail if you do not execute this 
change prior to the upgrade**
+
+A new column, `used_flag_last_updated`, is needed in the segments table to 
support new
+segment killing functionality. You can manually alter the table, or you can use
+a CLI tool to perform the update.
+
+#### CLI tool
+
+Druid provides a `metadata-update` tool for updating Druid's metadata tables.
+
+In the example commands below:
+
+- `lib` is the Druid lib directory
+- `extensions` is the Druid extensions directory
+- `base` corresponds to the value of `druid.metadata.storage.tables.base` in 
the configuration, `druid` by default.
+- The `--connectURI` parameter corresponds to the value of 
`druid.metadata.storage.connector.connectURI`.
+- The `--user` parameter corresponds to the value of 
`druid.metadata.storage.connector.user`.
+- The `--password` parameter corresponds to the value of 
`druid.metadata.storage.connector.password`.
+- The `--action` parameter corresponds to the update action you are executing. 
In this case it is: `add-last-used-to-segments`
+
+##### MySQL
+
+```bash
+cd ${DRUID_ROOT}
+java -classpath "lib/*" 
-Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml 
-Ddruid.extensions.directory="extensions" 
-Ddruid.extensions.loadList=[\"mysql-metadata-storage\"] 
-Ddruid.metadata.storage.type=mysql org.apache.druid.cli.Main tools 
metadata-update --connectURI="<mysql-uri>" --user <user> --password <pass> 
--base druid --action add-used-flag-last-updated-to-segments
+```
+
+##### PostgreSQL
+
+```bash
+cd ${DRUID_ROOT}
+java -classpath "lib/*" 
-Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml 
-Ddruid.extensions.directory="extensions" 
-Ddruid.extensions.loadList=[\"postgresql-metadata-storage\"] 
-Ddruid.metadata.storage.type=postgresql org.apache.druid.cli.Main tools 
metadata-update --connectURI="<postgresql-uri>" --user <user> --password <pass> 
--base druid --action add-used-flag-last-updated-to-segments
+```
+
+
+#### Manual ALTER TABLE
+
+```SQL
+ALTER TABLE druid_segments
+ADD used_flag_last_updated varchar(255);
+```
+
+### Populating `used_flag_last_updated` column of the segments table after 
upgrade (Optional)

Review Comment:
   I suggest adding a name here so we can link to this specific section more 
easily. Like:
   
   ```
   <a name="populate-used-flag-last-updated"></a>
   ```
   
   It goes before the `### Populating...` line.
   
   First place I'd like to it is the text "optional section below" above. That 
can be done like:
   
   ```
   [optional section below](#populate-used-flag-last-updated)
   ```



##########
docs/configuration/index.md:
##########
@@ -816,6 +816,7 @@ These Coordinator static configurations can be defined in 
the `coordinator/runti
 |`druid.coordinator.kill.period`|How often to send kill tasks to the indexing 
service. Value must be greater than `druid.coordinator.period.indexingPeriod`. 
Only applies if kill is turned on.|P1D (1 Day)|
 |`druid.coordinator.kill.durationToRetain`|Only applies if you set 
`druid.coordinator.kill.on` to `true`. This value is ignored if 
`druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations 
must be a ISO8601 period. Druid will not kill unused segments whose interval 
end date is beyond `now - durationToRetain`. `durationToRetain` can be a 
negative ISO8601 period, which would result in `now - durationToRetain` to be 
in the future.<br /><br />Note that the `durationToRetain` parameter applies to 
the segment interval, not the time that the segment was last marked unused. For 
example, if `durationToRetain` is set to `P90D`, then a segment for a time 
chunk 90 days in the past is eligible for permanent deletion immediately after 
being marked unused.|`P90D`|
 |`druid.coordinator.kill.ignoreDurationToRetain`|A way to override 
`druid.coordinator.kill.durationToRetain` and tell the coordinator that you do 
not care about the end date of unused segment intervals when it comes to 
killing them. If true, the coordinator considers all unused segments as 
eligible to be killed.|false|
+|`druid.coordinator.kill.bufferPeriod`|The amount of time that a segment must 
be unused before it is able to be permanently removed from metadata and deep 
storage. This can serve as a buffer period to prevent data loss if data ends up 
being needed after being marked unused.|`PT24H`|

Review Comment:
   We should describe that if you updated from an earlier version, this won't 
apply to all segments. We can also link from this to the doc about how to do 
the manual update. Maybe like:
   
   > The amount of time that a segment must be unused before it is able to be 
permanently removed from metadata and deep storage. This can serve as a buffer 
period to prevent data loss if data ends up being needed after being marked 
unused.`<br /><br />`Segments created with Druid 0.23.x and earlier do not have 
the necessary metadata required to respect this configuration, and may be 
permanently removed even if marked used or unused within the buffer period. To 
address this, you can `[populate this field in existing segment 
records](../operations/upgrade-prep.md#populate-used-flag-last-updated)` using 
a command-line tool or SQL command.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to