[GitHub] [druid] techdocsmith commented on a change in pull request #10935: First refactor of compaction

GitBox Wed, 17 Mar 2021 09:25:25 -0700


techdocsmith commented on a change in pull request #10935:
URL: https://github.com/apache/druid/pull/10935#discussion_r596182816




##########
File path: docs/ingestion/data-management.md
##########
@@ -21,173 +21,9 @@ title: "Data management"
   ~ specific language governing permissions and limitations
   ~ under the License.
   -->
+Within the context of this topic Data management refers to Apache Druid's data 
maintenance capabilities for existing datasources. There are several options to 
help you keep your data relevant and to help your Druid cluster remain 
performant. For example updating, reingesting, adding lookups, reindexing, or 
deleting data.
 
-
-
-
-## Schema changes
-
-Schemas for datasources can change at any time and Apache Druid supports 
different schemas among segments.
-
-### Replacing segments
-
-Druid uniquely
-identifies segments using the datasource, interval, version, and partition 
number. The partition number is only visible in the segment id if
-there are multiple segments created for some granularity of time. For example, 
if you have hourly segments, but you
-have more data in an hour than a single segment can hold, you can create 
multiple segments for the same hour. These segments will share
-the same datasource, interval, and version, but have linearly increasing 
partition numbers.
-
-```
-foo_2015-01-01/2015-01-02_v1_0
-foo_2015-01-01/2015-01-02_v1_1
-foo_2015-01-01/2015-01-02_v1_2
-```
-
-In the example segments above, the dataSource = foo, interval = 
2015-01-01/2015-01-02, version = v1, partitionNum = 0.
-If at some later point in time, you reindex the data with a new schema, the 
newly created segments will have a higher version id.
-
-```
-foo_2015-01-01/2015-01-02_v2_0
-foo_2015-01-01/2015-01-02_v2_1
-foo_2015-01-01/2015-01-02_v2_2
-```
-
-Druid batch indexing (either Hadoop-based or IndexTask-based) guarantees 
atomic updates on an interval-by-interval basis.
-In our example, until all `v2` segments for `2015-01-01/2015-01-02` are loaded 
in a Druid cluster, queries exclusively use `v1` segments.
-Once all `v2` segments are loaded and queryable, all queries ignore `v1` 
segments and switch to the `v2` segments.
-Shortly afterwards, the `v1` segments are unloaded from the cluster.
-
-Note that updates that span multiple segment intervals are only atomic within 
each interval. They are not atomic across the entire update.
-For example, you have segments such as the following:
-
-```
-foo_2015-01-01/2015-01-02_v1_0
-foo_2015-01-02/2015-01-03_v1_1
-foo_2015-01-03/2015-01-04_v1_2
-```
-
-`v2` segments will be loaded into the cluster as soon as they are built and 
replace `v1` segments for the period of time the
-segments overlap. Before v2 segments are completely loaded, your cluster may 
have a mixture of `v1` and `v2` segments.
-
-```
-foo_2015-01-01/2015-01-02_v1_0
-foo_2015-01-02/2015-01-03_v2_1
-foo_2015-01-03/2015-01-04_v1_2
-```
-
-In this case, queries may hit a mixture of `v1` and `v2` segments.
-
-### Different schemas among segments

Review comment:
       it is 100% duplicated from ../design/segments.md




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] techdocsmith commented on a change in pull request #10935: First refactor of compaction

Reply via email to