ccaominh commented on a change in pull request #8311: Docusaurus build 
framework + ingestion doc refresh.
URL: https://github.com/apache/incubator-druid/pull/8311#discussion_r314431778
 
 

 ##########
 File path: docs/ingestion/index.md
 ##########
 @@ -0,0 +1,769 @@
+---
+id: index
+title: "Ingestion"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+## Overview
+
+All data in Druid is organized into _segments_, which are data files that 
generally have up to a few million rows each.
+Loading data in Druid is called _ingestion_ or _indexing_ and consists of 
reading data from a source system and creating
+segments based on that data.
+
+In most ingestion methods, the work of loading data is done by Druid 
MiddleManager processes. One exception is
+Hadoop-based ingestion, where this work is instead done using a Hadoop 
MapReduce job on YARN (although MiddleManager
+processes are still involved in starting and monitoring the Hadoop jobs). Once 
segments have been generated and stored
+in [deep storage](../dependencies/deep-storage.md), they will be loaded by 
Historical processes. For more details on
+how this works under the hood, see the [Storage 
design](../design/architecture.md#storage-design) section of Druid's design
+documentation.
+
+## How to use this documentation
+
+This **page you are currently reading** provides information about universal 
Druid ingestion concepts, and about
+configurations that are common to all [ingestion methods](#ingestion-methods).
+
+The **individual pages for each ingestion method** provide additional 
information about concepts and configurations
+that are unique to each ingestion method.
+
+We recommend reading (or at least skimming) this universal page first, and 
then referring to the page for the
+ingestion method or methods that you have chosen.
+
+## Ingestion methods
+
+The table below lists Druid's most common data ingestion methods, along with 
comparisons to help you choose
+the best one for your situation. Each ingestion method supports its own set of 
source systems to pull from. For details
+about how each method works, as well as configuration properties specific to 
that method, check out its documentation
+page.
+
+### Streaming
+
+The most recommended, and most popular, method of streaming ingestion is the
+[Kafka indexing service](../development/extensions-core/kafka-ingestion.md) 
that reads directly from Kafka. The Kinesis
+indexing service also works well if you prefer Kinesis.
+
+This table compares the major available options:
+
+| **Method** | [Kafka](../development/extensions-core/kafka-ingestion.md) | 
[Kinesis](../development/extensions-core/kinesis-ingestion.md) | 
[Tranquility](tranquility.md) |
+|---|-----|--------------|------------|
+| **Supervisor type** | `kafka` | `kinesis` | N/A |
+| **How it works** | Druid reads directly from Apache Kafka. | Druid reads 
directly from Amazon Kinesis. | Tranquility, a library that ships separately 
from Druid, is used to push data into Druid. |
+| **Can ingest late data?** | Yes | Yes | No (late data is dropped based on 
the `windowPeriod` config) |
+| **Exactly-once guarantees?** | Yes | Yes | No |
+
+### Batch
+
+When doing batch loads from files, you should use one-time [tasks](tasks.md), 
and you have three options: `index`
+(native batch; single-task), `index_parallel` (native batch; parallel), or 
`index_hadoop` (Hadoop-based). The following
+table compares and contrasts the three batch ingestion options.
 
 Review comment:
   This last sentence is slightly out of place. Perhaps merge it with the 
sentence on line 81?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to