loquisgon commented on a change in pull request #11541:
URL: https://github.com/apache/druid/pull/11541#discussion_r683699109
##########
File path: docs/ingestion/index.md
##########
@@ -22,33 +22,24 @@ title: "Ingestion"
~ under the License.
-->
-All data in Druid is organized into _segments_, which are data files each of
which may have up to a few million rows.
-Loading data in Druid is called _ingestion_ or _indexing_, and consists of
reading data from a source system and creating
-segments based on that data.
+Loading data in Druid is called _ingestion_ or _indexing_. When you ingest
data into Druid, Druid reads the data from your source system and stores it in
data files called _segments_. In general, segment files contain a few million
rows.
-In most ingestion methods, the Druid
[MiddleManager](../design/middlemanager.md) processes
-(or the [Indexer](../design/indexer.md) processes) load your source data. One
exception is
-Hadoop-based ingestion, where this work is instead done using a Hadoop
MapReduce job on YARN (although MiddleManager or Indexer
-processes are still involved in starting and monitoring the Hadoop jobs).
+For most ingestion methods, the Druid
[MiddleManager](../design/middlemanager.md) processes or the
[Indexer](../design/indexer.md) processes load your source data. One exception
is
+Hadoop-based ingestion, which uses a Hadoop MapReduce job on YARN
MiddleManager or Indexer processes to start and monitor Hadoop jobs.
-Once segments have been generated and stored in [deep
storage](../dependencies/deep-storage.md), they are loaded by Historical
processes.
-For more details on how this works, see the [Storage
design](../design/architecture.md#storage-design) section
-of Druid's design documentation.
+After Druid creates segments and stores them in [deep
storage](../dependencies/deep-storage.md), Historical processes load them to
respond to queries. See the [Storage
design](../design/architecture.md#storage-design) section of the Druid design
documentation for more information.
Review comment:
We already said in first paragraph that Druid creates segments. Now we
are are saying that they get stored in a special place. I would rephrase this
as
"Segments created by the ingestion process get stored in [deep storage...]
which in turn are loaded in Historical nodes by Historical processes in order
to respond to Historical queries. See the [Storage design..].."
At some point the distinction has to be made between queries served by
historical processes and those served by real time (i.e. middle
manager/indexer) processes. BTW the latter only happens for streaming ingestion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]