techdocsmith commented on a change in pull request #10935:
URL: https://github.com/apache/druid/pull/10935#discussion_r596182816
##########
File path: docs/ingestion/data-management.md
##########
@@ -21,173 +21,9 @@ title: "Data management"
~ specific language governing permissions and limitations
~ under the License.
-->
+Within the context of this topic Data management refers to Apache Druid's data
maintenance capabilities for existing datasources. There are several options to
help you keep your data relevant and to help your Druid cluster remain
performant. For example updating, reingesting, adding lookups, reindexing, or
deleting data.
-
-
-
-## Schema changes
-
-Schemas for datasources can change at any time and Apache Druid supports
different schemas among segments.
-
-### Replacing segments
-
-Druid uniquely
-identifies segments using the datasource, interval, version, and partition
number. The partition number is only visible in the segment id if
-there are multiple segments created for some granularity of time. For example,
if you have hourly segments, but you
-have more data in an hour than a single segment can hold, you can create
multiple segments for the same hour. These segments will share
-the same datasource, interval, and version, but have linearly increasing
partition numbers.
-
-```
-foo_2015-01-01/2015-01-02_v1_0
-foo_2015-01-01/2015-01-02_v1_1
-foo_2015-01-01/2015-01-02_v1_2
-```
-
-In the example segments above, the dataSource = foo, interval =
2015-01-01/2015-01-02, version = v1, partitionNum = 0.
-If at some later point in time, you reindex the data with a new schema, the
newly created segments will have a higher version id.
-
-```
-foo_2015-01-01/2015-01-02_v2_0
-foo_2015-01-01/2015-01-02_v2_1
-foo_2015-01-01/2015-01-02_v2_2
-```
-
-Druid batch indexing (either Hadoop-based or IndexTask-based) guarantees
atomic updates on an interval-by-interval basis.
-In our example, until all `v2` segments for `2015-01-01/2015-01-02` are loaded
in a Druid cluster, queries exclusively use `v1` segments.
-Once all `v2` segments are loaded and queryable, all queries ignore `v1`
segments and switch to the `v2` segments.
-Shortly afterwards, the `v1` segments are unloaded from the cluster.
-
-Note that updates that span multiple segment intervals are only atomic within
each interval. They are not atomic across the entire update.
-For example, you have segments such as the following:
-
-```
-foo_2015-01-01/2015-01-02_v1_0
-foo_2015-01-02/2015-01-03_v1_1
-foo_2015-01-03/2015-01-04_v1_2
-```
-
-`v2` segments will be loaded into the cluster as soon as they are built and
replace `v1` segments for the period of time the
-segments overlap. Before v2 segments are completely loaded, your cluster may
have a mixture of `v1` and `v2` segments.
-
-```
-foo_2015-01-01/2015-01-02_v1_0
-foo_2015-01-02/2015-01-03_v2_1
-foo_2015-01-03/2015-01-04_v1_2
-```
-
-In this case, queries may hit a mixture of `v1` and `v2` segments.
-
-### Different schemas among segments
Review comment:
it is 100% duplicated from ../design/segments.md
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]