[GitHub] [druid] loquisgon commented on a change in pull request #11541: Docs Ingestion page refactor

GitBox Thu, 05 Aug 2021 11:38:54 -0700


loquisgon commented on a change in pull request #11541:
URL: https://github.com/apache/druid/pull/11541#discussion_r683699109




##########
File path: docs/ingestion/index.md
##########
@@ -22,33 +22,24 @@ title: "Ingestion"
   ~ under the License.
   -->
 
-All data in Druid is organized into _segments_, which are data files each of 
which may have up to a few million rows.
-Loading data in Druid is called _ingestion_ or _indexing_, and consists of 
reading data from a source system and creating
-segments based on that data.
+Loading data in Druid is called _ingestion_ or _indexing_. When you ingest 
data into Druid, Druid reads the data from your source system and stores it in 
data files called _segments_. In general, segment files contain a few million 
rows.
 
-In most ingestion methods, the Druid 
[MiddleManager](../design/middlemanager.md) processes
-(or the [Indexer](../design/indexer.md) processes) load your source data. One 
exception is
-Hadoop-based ingestion, where this work is instead done using a Hadoop 
MapReduce job on YARN (although MiddleManager or Indexer
-processes are still involved in starting and monitoring the Hadoop jobs). 
+For most ingestion methods, the Druid 
[MiddleManager](../design/middlemanager.md) processes or the 
[Indexer](../design/indexer.md) processes load your source data. One exception 
is
+Hadoop-based ingestion, which uses a Hadoop MapReduce job on YARN 
MiddleManager or Indexer processes to start and monitor Hadoop jobs. 
 
-Once segments have been generated and stored in [deep 
storage](../dependencies/deep-storage.md), they are loaded by Historical 
processes. 
-For more details on how this works, see the [Storage 
design](../design/architecture.md#storage-design) section 
-of Druid's design documentation.
+After Druid creates segments and stores them in [deep 
storage](../dependencies/deep-storage.md), Historical processes load them to 
respond to queries. See the [Storage 
design](../design/architecture.md#storage-design) section of the Druid design 
documentation for more information.

Review comment:
       We already said in first paragraph that Druid creates segments. Now we 
are are saying that they get stored in a special place. I would rephrase this 
as 
   
   "Segments created by the ingestion process get stored in [deep storage...] 
which in turn are loaded in Historical nodes by Historical processes in order 
to respond to Historical queries. See the [Storage design..].." 
   
   At some point the distinction has to be made between queries served by 
historical processes and those served by real time (i.e. middle 
manager/indexer) processes. BTW the latter only happens for streaming ingestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] loquisgon commented on a change in pull request #11541: Docs Ingestion page refactor

Reply via email to