sthetland commented on a change in pull request #11051:
URL: https://github.com/apache/druid/pull/11051#discussion_r604412994



##########
File path: docs/design/index.md
##########
@@ -22,79 +22,74 @@ title: "Introduction to Apache Druid"
   ~ under the License.
   -->
 
-## What is Druid?
+Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics 
("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Most often, Druid powers use cases where real-time 
ingestion, fast query performance, and high uptime are important.
 
-Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics
-("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Druid is most often
-used as a database for powering use cases where real-time ingest, fast query 
performance, and high uptime are important.
-As such, Druid is commonly used for powering GUIs of analytical applications, 
or as a backend for highly-concurrent APIs
-that need fast aggregations. Druid works best with event-oriented data.
+Druid is commonly used as the database backend for GUIs of analytical 
applications, or for highly-concurrent APIs that need fast aggregations. Druid 
works best with event-oriented data.
 
 Common application areas for Druid include:
 
-- Clickstream analytics (web and mobile analytics)
-- Network telemetry analytics (network performance monitoring)
+- Clickstream analytics including web and mobile analytics
+- Network telemetry analytics including network performance monitoring
 - Server metrics storage
-- Supply chain analytics (manufacturing metrics)
+- Supply chain analytics including manufacturing metrics
 - Application performance metrics
 - Digital marketing/advertising analytics
-- Business intelligence / OLAP
+- Business intelligence/OLAP
+
+## Key features of Druid
 
 Druid's core architecture combines ideas from data warehouses, timeseries 
databases, and logsearch systems. Some of
 Druid's key features are:
 
-1. **Columnar storage format.** Druid uses column-oriented storage, meaning it 
only needs to load the exact columns
-needed for a particular query.  This gives a huge speed boost to queries that 
only hit a few columns. In addition, each
-column is stored optimized for its particular data type, which supports fast 
scans and aggregations.
-2. **Scalable distributed system.** Druid is typically deployed in clusters of 
tens to hundreds of servers, and can
-offer ingest rates of millions of records/sec, retention of trillions of 
records, and query latencies of sub-second to a
-few seconds.
-3. **Massively parallel processing.** Druid can process a query in parallel 
across the entire cluster.
-4. **Realtime or batch ingestion.** Druid can ingest data either real-time 
(ingested data is immediately available for
-querying) or in batches.
-5. **Self-healing, self-balancing, easy to operate.** As an operator, to scale 
the cluster out or in, simply add or
-remove servers and the cluster will rebalance itself automatically, in the 
background, without any downtime. If any
-Druid servers fail, the system will automatically route around the damage 
until those servers can be replaced. Druid
-is designed to run 24/7 with no need for planned downtimes for any reason, 
including configuration changes and software
+1. **Columnar storage format.** Druid uses column-oriented storage. This means 
it only loads the exact columns
+needed for a particular query.  This greatly improves speed for queries that 
retrieve only a few columns. Additionally, to support fast scans and 
aggregations, Druid optimizes column storage for each column according to its 
data type.
+2. **Scalable distributed system.** Typical Druid deployments span clusters 
ranging from tens to hundreds of servers. Druid can ingest data at the rate of 
millions of records per second while retaining trillions of records and 
maintaining query latencies ranging from the sub-second to a few seconds.
+3. **Massively parallel processing.** Druid can process each query in parallel 
across the entire cluster.
+4. **Realtime or batch ingestion.** Druid can ingest data either real-time or 
in batches. Ingested data is immediately available for
+querying.
+5. **Self-healing, self-balancing, easy to operate.** As an operator, you add 
servers to scale out or
+remove servers to scale down. The Druid cluster re-balances itself 
automatically in the background without any downtime. If a
+Druid server fails, the system automatically routes data around the damage 
until the server can be replaced. Druid
+is designed to run continuously without planned downtime for any reason. This 
is true for configuration changes and software
 updates.
-6. **Cloud-native, fault-tolerant architecture that won't lose data.** Once 
Druid has ingested your data, a copy is
-stored safely in [deep storage](architecture.md#deep-storage) (typically cloud 
storage, HDFS, or a shared filesystem).
-Your data can be recovered from deep storage even if every single Druid server 
fails. For more limited failures affecting
-just a few Druid servers, replication ensures that queries are still possible 
while the system recovers.
+6. **Cloud-native, fault-tolerant architecture that won't lose data.** After 
ingestion, Druid safely stores a copy of your data in [deep 
storage](architecture.md#deep-storage). Deep storage is typically cloud 
storage, HDFS, or a shared filesystem. You can recover your data from deep 
storage even in the unlikely case that all Druid servers fail. For a limited 
failure that affects only a few Druid servers, Druid uses replication to ensure 
that queries are still possible during system recovers.

Review comment:
       I think "recovers" -> "recoveries", although the original seems to refer 
a little more clearly to "For a limited failure that affects..":
   
   ```suggestion
   6. **Cloud-native, fault-tolerant architecture that won't lose data.** After 
ingestion, Druid safely stores a copy of your data in [deep 
storage](architecture.md#deep-storage). Deep storage is typically cloud 
storage, HDFS, or a shared filesystem. You can recover your data from deep 
storage even in the unlikely case that all Druid servers fail. For a limited 
failure that affects only a few Druid servers, replication ensures that queries 
are still possible during system recoveries.
   ```
   

##########
File path: docs/design/index.md
##########
@@ -22,79 +22,74 @@ title: "Introduction to Apache Druid"
   ~ under the License.
   -->
 
-## What is Druid?
+Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics 
("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Most often, Druid powers use cases where real-time 
ingestion, fast query performance, and high uptime are important.
 
-Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics
-("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Druid is most often
-used as a database for powering use cases where real-time ingest, fast query 
performance, and high uptime are important.
-As such, Druid is commonly used for powering GUIs of analytical applications, 
or as a backend for highly-concurrent APIs
-that need fast aggregations. Druid works best with event-oriented data.
+Druid is commonly used as the database backend for GUIs of analytical 
applications, or for highly-concurrent APIs that need fast aggregations. Druid 
works best with event-oriented data.
 
 Common application areas for Druid include:
 
-- Clickstream analytics (web and mobile analytics)
-- Network telemetry analytics (network performance monitoring)
+- Clickstream analytics including web and mobile analytics
+- Network telemetry analytics including network performance monitoring
 - Server metrics storage
-- Supply chain analytics (manufacturing metrics)
+- Supply chain analytics including manufacturing metrics
 - Application performance metrics
 - Digital marketing/advertising analytics
-- Business intelligence / OLAP
+- Business intelligence/OLAP
+
+## Key features of Druid
 
 Druid's core architecture combines ideas from data warehouses, timeseries 
databases, and logsearch systems. Some of
 Druid's key features are:
 
-1. **Columnar storage format.** Druid uses column-oriented storage, meaning it 
only needs to load the exact columns
-needed for a particular query.  This gives a huge speed boost to queries that 
only hit a few columns. In addition, each
-column is stored optimized for its particular data type, which supports fast 
scans and aggregations.
-2. **Scalable distributed system.** Druid is typically deployed in clusters of 
tens to hundreds of servers, and can
-offer ingest rates of millions of records/sec, retention of trillions of 
records, and query latencies of sub-second to a
-few seconds.
-3. **Massively parallel processing.** Druid can process a query in parallel 
across the entire cluster.
-4. **Realtime or batch ingestion.** Druid can ingest data either real-time 
(ingested data is immediately available for
-querying) or in batches.
-5. **Self-healing, self-balancing, easy to operate.** As an operator, to scale 
the cluster out or in, simply add or
-remove servers and the cluster will rebalance itself automatically, in the 
background, without any downtime. If any
-Druid servers fail, the system will automatically route around the damage 
until those servers can be replaced. Druid
-is designed to run 24/7 with no need for planned downtimes for any reason, 
including configuration changes and software
+1. **Columnar storage format.** Druid uses column-oriented storage. This means 
it only loads the exact columns
+needed for a particular query.  This greatly improves speed for queries that 
retrieve only a few columns. Additionally, to support fast scans and 
aggregations, Druid optimizes column storage for each column according to its 
data type.
+2. **Scalable distributed system.** Typical Druid deployments span clusters 
ranging from tens to hundreds of servers. Druid can ingest data at the rate of 
millions of records per second while retaining trillions of records and 
maintaining query latencies ranging from the sub-second to a few seconds.
+3. **Massively parallel processing.** Druid can process each query in parallel 
across the entire cluster.
+4. **Realtime or batch ingestion.** Druid can ingest data either real-time or 
in batches. Ingested data is immediately available for
+querying.
+5. **Self-healing, self-balancing, easy to operate.** As an operator, you add 
servers to scale out or
+remove servers to scale down. The Druid cluster re-balances itself 
automatically in the background without any downtime. If a
+Druid server fails, the system automatically routes data around the damage 
until the server can be replaced. Druid
+is designed to run continuously without planned downtime for any reason. This 
is true for configuration changes and software
 updates.
-6. **Cloud-native, fault-tolerant architecture that won't lose data.** Once 
Druid has ingested your data, a copy is
-stored safely in [deep storage](architecture.md#deep-storage) (typically cloud 
storage, HDFS, or a shared filesystem).
-Your data can be recovered from deep storage even if every single Druid server 
fails. For more limited failures affecting
-just a few Druid servers, replication ensures that queries are still possible 
while the system recovers.
+6. **Cloud-native, fault-tolerant architecture that won't lose data.** After 
ingestion, Druid safely stores a copy of your data in [deep 
storage](architecture.md#deep-storage). Deep storage is typically cloud 
storage, HDFS, or a shared filesystem. You can recover your data from deep 
storage even in the unlikely case that all Druid servers fail. For a limited 
failure that affects only a few Druid servers, Druid uses replication to ensure 
that queries are still possible during system recovers.
 7. **Indexes for quick filtering.** Druid uses 
[Roaring](https://roaringbitmap.org/) or
-[CONCISE](https://arxiv.org/pdf/1004.0403) compressed bitmap indexes to create 
indexes that power fast filtering and
-searching across multiple columns.
-8. **Time-based partitioning.** Druid first partitions data by time, and can 
additionally partition based on other fields.
-This means time-based queries will only access the partitions that match the 
time range of the query. This leads to
-significant performance improvements for time-based data.
+[CONCISE](https://arxiv.org/pdf/1004.0403) compressed bitmap indexes to create 
indexes to enable fast filtering and searching across multiple columns.
+8. **Time-based partitioning.** Druid first partitions data by time. YOu can 
optionally implement additional partitioning based upon other fields.

Review comment:
       ```suggestion
   8. **Time-based partitioning.** Druid first partitions data by time. You can 
optionally implement additional partitioning based upon other fields.
   ```

##########
File path: docs/design/index.md
##########
@@ -22,79 +22,74 @@ title: "Introduction to Apache Druid"
   ~ under the License.
   -->
 
-## What is Druid?
+Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics 
("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Most often, Druid powers use cases where real-time 
ingestion, fast query performance, and high uptime are important.
 
-Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics
-("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Druid is most often
-used as a database for powering use cases where real-time ingest, fast query 
performance, and high uptime are important.
-As such, Druid is commonly used for powering GUIs of analytical applications, 
or as a backend for highly-concurrent APIs
-that need fast aggregations. Druid works best with event-oriented data.
+Druid is commonly used as the database backend for GUIs of analytical 
applications, or for highly-concurrent APIs that need fast aggregations. Druid 
works best with event-oriented data.
 
 Common application areas for Druid include:
 
-- Clickstream analytics (web and mobile analytics)
-- Network telemetry analytics (network performance monitoring)
+- Clickstream analytics including web and mobile analytics
+- Network telemetry analytics including network performance monitoring
 - Server metrics storage
-- Supply chain analytics (manufacturing metrics)
+- Supply chain analytics including manufacturing metrics
 - Application performance metrics
 - Digital marketing/advertising analytics
-- Business intelligence / OLAP
+- Business intelligence/OLAP
+
+## Key features of Druid
 
 Druid's core architecture combines ideas from data warehouses, timeseries 
databases, and logsearch systems. Some of
 Druid's key features are:
 
-1. **Columnar storage format.** Druid uses column-oriented storage, meaning it 
only needs to load the exact columns
-needed for a particular query.  This gives a huge speed boost to queries that 
only hit a few columns. In addition, each
-column is stored optimized for its particular data type, which supports fast 
scans and aggregations.
-2. **Scalable distributed system.** Druid is typically deployed in clusters of 
tens to hundreds of servers, and can
-offer ingest rates of millions of records/sec, retention of trillions of 
records, and query latencies of sub-second to a
-few seconds.
-3. **Massively parallel processing.** Druid can process a query in parallel 
across the entire cluster.
-4. **Realtime or batch ingestion.** Druid can ingest data either real-time 
(ingested data is immediately available for
-querying) or in batches.
-5. **Self-healing, self-balancing, easy to operate.** As an operator, to scale 
the cluster out or in, simply add or
-remove servers and the cluster will rebalance itself automatically, in the 
background, without any downtime. If any
-Druid servers fail, the system will automatically route around the damage 
until those servers can be replaced. Druid
-is designed to run 24/7 with no need for planned downtimes for any reason, 
including configuration changes and software
+1. **Columnar storage format.** Druid uses column-oriented storage. This means 
it only loads the exact columns
+needed for a particular query.  This greatly improves speed for queries that 
retrieve only a few columns. Additionally, to support fast scans and 
aggregations, Druid optimizes column storage for each column according to its 
data type.
+2. **Scalable distributed system.** Typical Druid deployments span clusters 
ranging from tens to hundreds of servers. Druid can ingest data at the rate of 
millions of records per second while retaining trillions of records and 
maintaining query latencies ranging from the sub-second to a few seconds.
+3. **Massively parallel processing.** Druid can process each query in parallel 
across the entire cluster.
+4. **Realtime or batch ingestion.** Druid can ingest data either real-time or 
in batches. Ingested data is immediately available for
+querying.
+5. **Self-healing, self-balancing, easy to operate.** As an operator, you add 
servers to scale out or
+remove servers to scale down. The Druid cluster re-balances itself 
automatically in the background without any downtime. If a
+Druid server fails, the system automatically routes data around the damage 
until the server can be replaced. Druid
+is designed to run continuously without planned downtime for any reason. This 
is true for configuration changes and software
 updates.
-6. **Cloud-native, fault-tolerant architecture that won't lose data.** Once 
Druid has ingested your data, a copy is
-stored safely in [deep storage](architecture.md#deep-storage) (typically cloud 
storage, HDFS, or a shared filesystem).
-Your data can be recovered from deep storage even if every single Druid server 
fails. For more limited failures affecting
-just a few Druid servers, replication ensures that queries are still possible 
while the system recovers.
+6. **Cloud-native, fault-tolerant architecture that won't lose data.** After 
ingestion, Druid safely stores a copy of your data in [deep 
storage](architecture.md#deep-storage). Deep storage is typically cloud 
storage, HDFS, or a shared filesystem. You can recover your data from deep 
storage even in the unlikely case that all Druid servers fail. For a limited 
failure that affects only a few Druid servers, Druid uses replication to ensure 
that queries are still possible during system recovers.
 7. **Indexes for quick filtering.** Druid uses 
[Roaring](https://roaringbitmap.org/) or
-[CONCISE](https://arxiv.org/pdf/1004.0403) compressed bitmap indexes to create 
indexes that power fast filtering and
-searching across multiple columns.
-8. **Time-based partitioning.** Druid first partitions data by time, and can 
additionally partition based on other fields.
-This means time-based queries will only access the partitions that match the 
time range of the query. This leads to
-significant performance improvements for time-based data.
+[CONCISE](https://arxiv.org/pdf/1004.0403) compressed bitmap indexes to create 
indexes to enable fast filtering and searching across multiple columns.
+8. **Time-based partitioning.** Druid first partitions data by time. YOu can 
optionally implement additional partitioning based upon other fields.
+Time-based queries only access the partitions that match the time range of the 
query which leads to significant performance improvements.
 9. **Approximate algorithms.** Druid includes algorithms for approximate 
count-distinct, approximate ranking, and
 computation of approximate histograms and quantiles. These algorithms offer 
bounded memory usage and are often
 substantially faster than exact computations. For situations where accuracy is 
more important than speed, Druid also
 offers exact count-distinct and exact ranking.
 10. **Automatic summarization at ingest time.** Druid optionally supports data 
summarization at ingestion time. This
-summarization partially pre-aggregates your data, and can lead to big costs 
savings and performance boosts.
+summarization partially pre-aggregates your data potentially leading to 
significant cost savings and performance boosts.

Review comment:
       The rewording is good, but it runs together without a comma, I think:
     
   ```suggestion
   summarization partially pre-aggregates your data, potentially leading to 
significant cost savings and performance boosts.
   ```

##########
File path: docs/design/index.md
##########
@@ -22,79 +22,74 @@ title: "Introduction to Apache Druid"
   ~ under the License.
   -->
 
-## What is Druid?
+Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics 
("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Most often, Druid powers use cases where real-time 
ingestion, fast query performance, and high uptime are important.
 
-Apache Druid is a real-time analytics database designed for fast 
slice-and-dice analytics
-("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) 
on large data sets. Druid is most often
-used as a database for powering use cases where real-time ingest, fast query 
performance, and high uptime are important.
-As such, Druid is commonly used for powering GUIs of analytical applications, 
or as a backend for highly-concurrent APIs
-that need fast aggregations. Druid works best with event-oriented data.
+Druid is commonly used as the database backend for GUIs of analytical 
applications, or for highly-concurrent APIs that need fast aggregations. Druid 
works best with event-oriented data.
 
 Common application areas for Druid include:
 
-- Clickstream analytics (web and mobile analytics)
-- Network telemetry analytics (network performance monitoring)
+- Clickstream analytics including web and mobile analytics
+- Network telemetry analytics including network performance monitoring
 - Server metrics storage
-- Supply chain analytics (manufacturing metrics)
+- Supply chain analytics including manufacturing metrics
 - Application performance metrics
 - Digital marketing/advertising analytics
-- Business intelligence / OLAP
+- Business intelligence/OLAP
+
+## Key features of Druid
 
 Druid's core architecture combines ideas from data warehouses, timeseries 
databases, and logsearch systems. Some of
 Druid's key features are:
 
-1. **Columnar storage format.** Druid uses column-oriented storage, meaning it 
only needs to load the exact columns
-needed for a particular query.  This gives a huge speed boost to queries that 
only hit a few columns. In addition, each
-column is stored optimized for its particular data type, which supports fast 
scans and aggregations.
-2. **Scalable distributed system.** Druid is typically deployed in clusters of 
tens to hundreds of servers, and can
-offer ingest rates of millions of records/sec, retention of trillions of 
records, and query latencies of sub-second to a
-few seconds.
-3. **Massively parallel processing.** Druid can process a query in parallel 
across the entire cluster.
-4. **Realtime or batch ingestion.** Druid can ingest data either real-time 
(ingested data is immediately available for
-querying) or in batches.
-5. **Self-healing, self-balancing, easy to operate.** As an operator, to scale 
the cluster out or in, simply add or
-remove servers and the cluster will rebalance itself automatically, in the 
background, without any downtime. If any
-Druid servers fail, the system will automatically route around the damage 
until those servers can be replaced. Druid
-is designed to run 24/7 with no need for planned downtimes for any reason, 
including configuration changes and software
+1. **Columnar storage format.** Druid uses column-oriented storage. This means 
it only loads the exact columns
+needed for a particular query.  This greatly improves speed for queries that 
retrieve only a few columns. Additionally, to support fast scans and 
aggregations, Druid optimizes column storage for each column according to its 
data type.
+2. **Scalable distributed system.** Typical Druid deployments span clusters 
ranging from tens to hundreds of servers. Druid can ingest data at the rate of 
millions of records per second while retaining trillions of records and 
maintaining query latencies ranging from the sub-second to a few seconds.
+3. **Massively parallel processing.** Druid can process each query in parallel 
across the entire cluster.
+4. **Realtime or batch ingestion.** Druid can ingest data either real-time or 
in batches. Ingested data is immediately available for
+querying.
+5. **Self-healing, self-balancing, easy to operate.** As an operator, you add 
servers to scale out or
+remove servers to scale down. The Druid cluster re-balances itself 
automatically in the background without any downtime. If a
+Druid server fails, the system automatically routes data around the damage 
until the server can be replaced. Druid
+is designed to run continuously without planned downtime for any reason. This 
is true for configuration changes and software
 updates.
-6. **Cloud-native, fault-tolerant architecture that won't lose data.** Once 
Druid has ingested your data, a copy is
-stored safely in [deep storage](architecture.md#deep-storage) (typically cloud 
storage, HDFS, or a shared filesystem).
-Your data can be recovered from deep storage even if every single Druid server 
fails. For more limited failures affecting
-just a few Druid servers, replication ensures that queries are still possible 
while the system recovers.
+6. **Cloud-native, fault-tolerant architecture that won't lose data.** After 
ingestion, Druid safely stores a copy of your data in [deep 
storage](architecture.md#deep-storage). Deep storage is typically cloud 
storage, HDFS, or a shared filesystem. You can recover your data from deep 
storage even in the unlikely case that all Druid servers fail. For a limited 
failure that affects only a few Druid servers, Druid uses replication to ensure 
that queries are still possible during system recovers.
 7. **Indexes for quick filtering.** Druid uses 
[Roaring](https://roaringbitmap.org/) or
-[CONCISE](https://arxiv.org/pdf/1004.0403) compressed bitmap indexes to create 
indexes that power fast filtering and
-searching across multiple columns.
-8. **Time-based partitioning.** Druid first partitions data by time, and can 
additionally partition based on other fields.
-This means time-based queries will only access the partitions that match the 
time range of the query. This leads to
-significant performance improvements for time-based data.
+[CONCISE](https://arxiv.org/pdf/1004.0403) compressed bitmap indexes to create 
indexes to enable fast filtering and searching across multiple columns.
+8. **Time-based partitioning.** Druid first partitions data by time. YOu can 
optionally implement additional partitioning based upon other fields.
+Time-based queries only access the partitions that match the time range of the 
query which leads to significant performance improvements.
 9. **Approximate algorithms.** Druid includes algorithms for approximate 
count-distinct, approximate ranking, and
 computation of approximate histograms and quantiles. These algorithms offer 
bounded memory usage and are often
 substantially faster than exact computations. For situations where accuracy is 
more important than speed, Druid also
 offers exact count-distinct and exact ranking.
 10. **Automatic summarization at ingest time.** Druid optionally supports data 
summarization at ingestion time. This
-summarization partially pre-aggregates your data, and can lead to big costs 
savings and performance boosts.
+summarization partially pre-aggregates your data potentially leading to 
significant cost savings and performance boosts.
 
-## When should I use Druid?
+## When to use Druid
 
-Druid is used by many companies of various sizes for many different use cases. 
Check out the
-[Powered by Apache Druid](/druid-powered) page
+Druid is used by many companies of various sizes for many different use cases. 
For more information see
+[Powered by Apache Druid](/druid-powered).
 
-Druid is likely a good choice if your use case fits a few of the following 
descriptors:
+Druid is likely a good choice if your use case has matches a few of the 
following:

Review comment:
       ```suggestion
   Druid is likely a good choice if your use case matches a few of the 
following:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to