jihoonson commented on a change in pull request #8311: Docusaurus build framework + ingestion doc refresh. URL: https://github.com/apache/incubator-druid/pull/8311#discussion_r315428687
########## File path: docs/design/index.md ########## @@ -0,0 +1,100 @@ +--- +id: index +title: "Introduction to Apache Druid" +--- + +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> + +## What is Druid? + +Apache Druid (incubating) is a real-time analytics database designed for fast slice-and-dice analytics +("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) on large data sets. Druid is most often +used as a database for powering use cases where real-time ingest, fast query performance, and high uptime are important. +As such, Druid is commonly used for powering GUIs of analytical applications, or as a backend for highly-concurrent APIs +that need fast aggregations. Druid works best with event-oriented data. + +Common application areas for Druid include: + +- Clickstream analytics (web and mobile analytics) +- Network telemetry analytics (network performance monitoring) +- Server metrics storage +- Supply chain analytics (manufacturing metrics) +- Application performance metrics +- Digital marketing/advertising analytics +- Business intelligence / OLAP + +Druid's core architecture combines ideas from data warehouses, timeseries databases, and logsearch systems. Some of +Druid's key features are: + +1. **Columnar storage format.** Druid uses column-oriented storage, meaning it only needs to load the exact columns +needed for a particular query. This gives a huge speed boost to queries that only hit a few columns. In addition, each +column is stored optimized for its particular data type, which supports fast scans and aggregations. +2. **Scalable distributed system.** Druid is typically deployed in clusters of tens to hundreds of servers, and can +offer ingest rates of millions of records/sec, retention of trillions of records, and query latencies of sub-second to a +few seconds. +3. **Massively parallel processing.** Druid can process a query in parallel across the entire cluster. +4. **Realtime or batch ingestion.** Druid can ingest data either real-time (ingested data is immediately available for +querying) or in batches. +5. **Self-healing, self-balancing, easy to operate.** As an operator, to scale the cluster out or in, simply add or +remove servers and the cluster will rebalance itself automatically, in the background, without any downtime. If any +Druid servers fail, the system will automatically route around the damage until those servers can be replaced. Druid +is designed to run 24/7 with no need for planned downtimes for any reason, including configuration changes and software +updates. +6. **Cloud-native, fault-tolerant architecture that won't lose data.** Once Druid has ingested your data, a copy is +stored safely in [deep storage](#deep-storage) (typically cloud storage, HDFS, or a shared filesystem). Your data can be Review comment: Broken link. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
