This is an automated email from the ASF dual-hosted git repository.
fjy pushed a commit to branch 0.14.0-incubating
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git
The following commit(s) were added to refs/heads/0.14.0-incubating by this push:
new 510de41 Add doc for Hadoop-based ingestion vs Native batch ingestion
(#7044) (#7103)
510de41 is described below
commit 510de418995ed1f5e33cb5c317c20c293312406c
Author: Jihoon Son <[email protected]>
AuthorDate: Tue Feb 19 21:07:58 2019 -0800
Add doc for Hadoop-based ingestion vs Native batch ingestion (#7044) (#7103)
* Add doc for Hadoop-based ingestion vs Native batch ingestion
* add links
* add links
---
docs/content/ingestion/hadoop-vs-native-batch.md | 43 ++++++++++++++++++++++++
docs/content/ingestion/hadoop.md | 4 ++-
docs/content/ingestion/native_tasks.md | 2 ++
docs/content/ingestion/tasks.md | 4 +++
4 files changed, 52 insertions(+), 1 deletion(-)
diff --git a/docs/content/ingestion/hadoop-vs-native-batch.md
b/docs/content/ingestion/hadoop-vs-native-batch.md
new file mode 100644
index 0000000..ce2c97e
--- /dev/null
+++ b/docs/content/ingestion/hadoop-vs-native-batch.md
@@ -0,0 +1,43 @@
+---
+layout: doc_page
+title: "Hadoop-based Batch Ingestion VS Native Batch Ingestion"
+---
+
+<!--
+ ~ Licensed to the Apache Software Foundation (ASF) under one
+ ~ or more contributor license agreements. See the NOTICE file
+ ~ distributed with this work for additional information
+ ~ regarding copyright ownership. The ASF licenses this file
+ ~ to you under the Apache License, Version 2.0 (the
+ ~ "License"); you may not use this file except in compliance
+ ~ with the License. You may obtain a copy of the License at
+ ~
+ ~ http://www.apache.org/licenses/LICENSE-2.0
+ ~
+ ~ Unless required by applicable law or agreed to in writing,
+ ~ software distributed under the License is distributed on an
+ ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ ~ KIND, either express or implied. See the License for the
+ ~ specific language governing permissions and limitations
+ ~ under the License.
+ -->
+
+# Comparison of Batch Ingestion Methods
+
+Druid basically supports three types of batch ingestion: Hadoop-based
+batch ingestion, native parallel batch ingestion, and native local batch
+ingestion. The below table shows what features are supported by each
+ingestion method.
+
+
+| |Hadoop-based ingestion|Native parallel ingestion|Native local ingestion|
+|---|----------------------|-------------------------|----------------------|
+| Parallel indexing | Always parallel | Parallel if firehose is splittable |
Always sequential |
+| Supported indexing modes | Replacing mode | Both appending and replacing
modes | Both appending and replacing modes |
+| External dependency | Hadoop (it internally submits Hadoop jobs) | No
dependency | No dependency |
+| Supported [rollup
modes](http://druid.io/docs/latest/ingestion/index.html#roll-up-modes) |
Perfect rollup | Best-effort rollup | Both perfect and best-effort rollup |
+| Supported partitioning methods | [Both Hash-based and range
partitioning](http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification)
| N/A | Hash-based partitioning (when `forceGuaranteedRollup` = true) |
+| Supported input locations | All locations accessible via HDFS client or
Druid dataSource | All implemented [firehoses](./firehose.html) | All
implemented [firehoses](./firehose.html) |
+| Supported file formats | All implemented Hadoop InputFormats | Currently
only text file format (CSV, TSV, JSON) | Currently only text file format (CSV,
TSV, JSON) |
+| Saving parse exceptions in ingestion report | Currently not supported |
Currently not supported | Supported |
+| Custom segment version | Supported, but this is NOT recommended | N/A | N/A |
diff --git a/docs/content/ingestion/hadoop.md b/docs/content/ingestion/hadoop.md
index 4f8174c..c824fd0 100644
--- a/docs/content/ingestion/hadoop.md
+++ b/docs/content/ingestion/hadoop.md
@@ -25,7 +25,9 @@ title: "Hadoop-based Batch Ingestion"
# Hadoop-based Batch Ingestion
Hadoop-based batch ingestion in Druid is supported via a Hadoop-ingestion
task. These tasks can be posted to a running
-instance of a Druid [Overlord](../design/overlord.html).
+instance of a Druid [Overlord](../design/overlord.html).
+
+Please check [Hadoop-based Batch Ingestion VS Native Batch
Ingestion](./hadoop-vs-native-batch.html) for differences between native batch
ingestion and Hadoop-based ingestion.
## Command Line Hadoop Indexer
diff --git a/docs/content/ingestion/native_tasks.md
b/docs/content/ingestion/native_tasks.md
index e5b2e7d..b9657d1 100644
--- a/docs/content/ingestion/native_tasks.md
+++ b/docs/content/ingestion/native_tasks.md
@@ -28,6 +28,8 @@ Druid currently has two types of native batch indexing tasks,
`index_parallel` w
in parallel on multiple MiddleManager nodes, and `index` which will run a
single indexing task locally on a single
MiddleManager.
+Please check [Hadoop-based Batch Ingestion VS Native Batch
Ingestion](./hadoop-vs-native-batch.html) for differences between native batch
ingestion and Hadoop-based ingestion.
+
Parallel Index Task
--------------------------------
diff --git a/docs/content/ingestion/tasks.md b/docs/content/ingestion/tasks.md
index 41f7b52..4653d6b 100644
--- a/docs/content/ingestion/tasks.md
+++ b/docs/content/ingestion/tasks.md
@@ -41,6 +41,10 @@ See [batch ingestion](../ingestion/hadoop.html).
Druid provides a native index task which doesn't need any dependencies on
other systems.
See [native index tasks](./native_tasks.html) for more details.
+<div class="note info">
+Please check [Hadoop-based Batch Ingestion VS Native Batch
Ingestion](./hadoop-vs-native-batch.html) for differences between native batch
ingestion and Hadoop-based ingestion.
+</div>
+
### Kafka Indexing Tasks
Kafka Indexing tasks are automatically created by a Kafka Supervisor and are
responsible for pulling data from Kafka streams. These tasks are not meant to
be created/submitted directly by users. See [Kafka Indexing
Service](../development/extensions-core/kafka-ingestion.html) for more details.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]