DRILL-2315: Confluence conversion plus fixes
Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/d959a210 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/d959a210 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/d959a210 Branch: refs/heads/gh-pages Commit: d959a210053f02b5069f0a0cb9f0d34131640ffb Parents: 23f82db Author: Kristine Hahn <kh...@maprtech.com> Authored: Thu Jan 15 19:42:12 2015 -0800 Committer: Bridget Bevens <bbev...@maprtech.com> Committed: Wed Feb 25 16:22:24 2015 -0800 ---------------------------------------------------------------------- .gitignore | 1 + _docs/001-arch.md | 49 +++ _docs/001-drill-docs.md | 4 - _docs/002-tutorial.md | 51 +++ _docs/003-yelp.md | 412 ++++++++++++++++++ _docs/004-install.md | 13 + _docs/005-connect.md | 41 ++ _docs/006-interfaces.md | 50 +++ _docs/007-query.md | 41 ++ _docs/008-sql-ref.md | 14 + _docs/009-dev-custom-func.md | 37 ++ _docs/010-manage.md | 14 + _docs/011-develop.md | 9 + _docs/012-rn.md | 191 +++++++++ _docs/013-contribute.md | 9 + _docs/014-sample-ds.md | 10 + _docs/015-design.md | 13 + _docs/016-progress.md | 8 + _docs/017-archived-pages.md | 8 + _docs/018-bylaws.md | 170 ++++++++ _docs/arch/001-core-mod.md | 29 ++ _docs/arch/002-arch-hilite.md | 10 + _docs/arch/arch-hilite/001-flexibility.md | 78 ++++ _docs/arch/arch-hilite/002-performance.md | 55 +++ _docs/archive/001-how-to-demo.md | 309 ++++++++++++++ _docs/archive/002-meet-drill.md | 41 ++ _docs/connect/001-plugin-reg.md | 35 ++ _docs/connect/002-workspaces.md | 74 ++++ _docs/connect/003-reg-fs.md | 64 +++ _docs/connect/004-reg-hbase.md | 32 ++ _docs/connect/005-reg-hive.md | 83 ++++ _docs/connect/006-default-frmt.md | 60 +++ _docs/connect/007-mongo-plugin.md | 167 ++++++++ _docs/connect/008-mapr-db-plugin.md | 31 ++ _docs/contribute/001-guidelines.md | 229 ++++++++++ _docs/contribute/002-ideas.md | 158 +++++++ _docs/datasets/001-aol.md | 47 +++ _docs/datasets/002-enron.md | 19 + _docs/datasets/003-wikipedia.md | 105 +++++ _docs/design/001-plan.md | 25 ++ _docs/design/002-rpc.md | 19 + _docs/design/003-query-stages.md | 42 ++ _docs/design/004-research.md | 48 +++ _docs/design/005-value.md | 163 +++++++ _docs/dev-custom-fcn/001-dev-simple.md | 50 +++ _docs/dev-custom-fcn/002-dev-aggregate.md | 55 +++ _docs/dev-custom-fcn/003-add-custom.md | 26 ++ _docs/dev-custom-fcn/004-use-custom.md | 55 +++ _docs/dev-custom-fcn/005-cust-interface.md | 8 + _docs/develop/001-compile.md | 37 ++ _docs/develop/002-setup.md | 5 + _docs/develop/003-patch-tool.md | 160 +++++++ _docs/drill-docs/001-arch.md | 58 --- _docs/drill-docs/002-tutorial.md | 58 --- _docs/drill-docs/003-yelp.md | 402 ------------------ _docs/drill-docs/004-install.md | 20 - _docs/drill-docs/005-connect.md | 49 --- _docs/drill-docs/006-query.md | 57 --- _docs/drill-docs/006-sql-ref.md | 25 -- _docs/drill-docs/007-dev-custom-func.md | 47 --- _docs/drill-docs/008-manage.md | 23 - _docs/drill-docs/009-develop.md | 16 - _docs/drill-docs/010-rn.md | 192 --------- _docs/drill-docs/011-contribute.md | 11 - _docs/drill-docs/012-sample-ds.md | 11 - _docs/drill-docs/013-design.md | 14 - _docs/drill-docs/014-progress.md | 9 - _docs/drill-docs/015-archived-pages.md | 9 - _docs/drill-docs/016-bylaws.md | 171 -------- _docs/drill-docs/arch/001-core-mod.md | 30 -- _docs/drill-docs/arch/002-arch-hilite.md | 15 - .../arch/arch-hilite/001-flexibility.md | 79 ---- .../arch/arch-hilite/002-performance.md | 56 --- _docs/drill-docs/archive/001-how-to-demo.md | 309 -------------- _docs/drill-docs/archive/002-meet-drill.md | 41 -- _docs/drill-docs/connect/001-plugin-reg.md | 39 -- _docs/drill-docs/connect/002-mongo-plugin.md | 169 -------- _docs/drill-docs/connect/003-mapr-db-plugin.md | 32 -- .../connect/workspaces/001-workspaces.md | 82 ---- .../drill-docs/connect/workspaces/002-reg-fs.md | 69 --- .../connect/workspaces/003-reg-hbase.md | 34 -- .../connect/workspaces/004-reg-hive.md | 99 ----- .../connect/workspaces/005-default-frmt.md | 61 --- _docs/drill-docs/contribute/001-guidelines.md | 230 ---------- _docs/drill-docs/contribute/002-ideas.md | 158 ------- _docs/drill-docs/datasets/001-aol.md | 47 --- _docs/drill-docs/datasets/002-enron.md | 21 - _docs/drill-docs/datasets/003-wikipedia.md | 105 ----- _docs/drill-docs/design/001-plan.md | 25 -- _docs/drill-docs/design/002-rpc.md | 19 - _docs/drill-docs/design/003-query-stages.md | 42 -- _docs/drill-docs/design/004-research.md | 48 --- _docs/drill-docs/design/005-value.md | 191 --------- .../drill-docs/dev-custom-fcn/001-dev-simple.md | 51 --- .../dev-custom-fcn/002-dev-aggregate.md | 59 --- .../drill-docs/dev-custom-fcn/003-add-custom.md | 28 -- .../drill-docs/dev-custom-fcn/004-use-custom.md | 55 --- .../dev-custom-fcn/005-cust-interface.md | 14 - _docs/drill-docs/develop/001-compile.md | 37 -- _docs/drill-docs/develop/002-setup.md | 5 - _docs/drill-docs/develop/003-patch-tool.md | 160 ------- _docs/drill-docs/install/001-drill-in-10.md | 395 ----------------- _docs/drill-docs/install/002-deploy.md | 102 ----- .../drill-docs/install/003-install-embedded.md | 30 -- .../install/004-install-distributed.md | 61 --- .../install-embedded/001-install-linux.md | 30 -- .../install/install-embedded/002-install-mac.md | 33 -- .../install/install-embedded/003-install-win.md | 57 --- _docs/drill-docs/manage/001-conf.md | 20 - _docs/drill-docs/manage/002-start-stop.md | 45 -- _docs/drill-docs/manage/003-ports.md | 9 - _docs/drill-docs/manage/004-partition-prune.md | 75 ---- _docs/drill-docs/manage/005-monitor-cancel.md | 30 -- _docs/drill-docs/manage/conf/001-mem-alloc.md | 31 -- _docs/drill-docs/manage/conf/002-startup-opt.md | 50 --- _docs/drill-docs/manage/conf/003-plan-exec.md | 37 -- .../drill-docs/manage/conf/004-persist-conf.md | 93 ---- _docs/drill-docs/progress/001-2014-q1.md | 204 --------- _docs/drill-docs/query/001-query-fs.md | 44 -- _docs/drill-docs/query/002-query-hbase.md | 177 -------- _docs/drill-docs/query/003-query-hive.md | 67 --- _docs/drill-docs/query/004-query-complex.md | 63 --- _docs/drill-docs/query/005-query-info-skema.md | 109 ----- _docs/drill-docs/query/006-query-sys-tbl.md | 176 -------- _docs/drill-docs/query/007-interfaces.md | 16 - _docs/drill-docs/query/interfaces/001-jdbc.md | 138 ------ _docs/drill-docs/query/interfaces/002-odbc.md | 23 - .../query/query-complex/001-sample-donuts.md | 40 -- .../query/query-complex/002-query1-select.md | 19 - .../query/query-complex/003-query2-use-sql.md | 74 ---- .../query/query-complex/004-query3-sel-nest.md | 50 --- .../query-complex/005-query4-sel-multiple.md | 24 -- .../drill-docs/query/query-fs/001-query-json.md | 41 -- .../query/query-fs/002-query-parquet.md | 99 ----- .../drill-docs/query/query-fs/003-query-text.md | 120 ------ .../drill-docs/query/query-fs/004-query-dir.md | 90 ---- _docs/drill-docs/rn/001-0.5.0rn.md | 29 -- _docs/drill-docs/rn/002-0.4.0rn.md | 42 -- _docs/drill-docs/rn/003-alpha-rn.md | 44 -- _docs/drill-docs/rn/004-0.6.0-rn.md | 32 -- _docs/drill-docs/rn/005-0.7.0-rn.md | 56 --- _docs/drill-docs/sql-ref/001-data-types.md | 96 ----- _docs/drill-docs/sql-ref/002-operators.md | 71 ---- _docs/drill-docs/sql-ref/003-functions.md | 185 -------- _docs/drill-docs/sql-ref/004-nest-functions.md | 10 - _docs/drill-docs/sql-ref/005-cmd-summary.md | 16 - _docs/drill-docs/sql-ref/006-reserved-wds.md | 16 - .../sql-ref/cmd-summary/001-create-table-as.md | 134 ------ .../sql-ref/cmd-summary/002-explain.md | 166 -------- .../sql-ref/cmd-summary/003-select.md | 85 ---- .../sql-ref/cmd-summary/004-show-files.md | 65 --- _docs/drill-docs/sql-ref/data-types/001-date.md | 148 ------- _docs/drill-docs/sql-ref/nested/001-flatten.md | 89 ---- _docs/drill-docs/sql-ref/nested/002-kvgen.md | 150 ------- .../sql-ref/nested/003-repeated-cnt.md | 34 -- .../drill-docs/tutorial/001-install-sandbox.md | 56 --- _docs/drill-docs/tutorial/002-get2kno-sb.md | 235 ----------- _docs/drill-docs/tutorial/003-lesson1.md | 423 ------------------- _docs/drill-docs/tutorial/004-lesson2.md | 392 ----------------- _docs/drill-docs/tutorial/005-lesson3.md | 379 ----------------- _docs/drill-docs/tutorial/006-summary.md | 14 - .../install-sandbox/001-install-mapr-vm.md | 55 --- .../install-sandbox/002-install-mapr-vb.md | 72 ---- _docs/img/58.png | Bin 0 -> 35404 bytes _docs/img/BI_to_Drill_2.png | Bin 0 -> 46126 bytes _docs/img/HbaseViewCreation0.png | Bin 0 -> 22945 bytes _docs/img/HbaseViewDSN.png | Bin 0 -> 32284 bytes _docs/img/Hbase_Browse.png | Bin 0 -> 147495 bytes _docs/img/Hive_DSN.png | Bin 0 -> 31302 bytes _docs/img/ODBC_CustomSQL.png | Bin 0 -> 41405 bytes _docs/img/ODBC_HbasePreview2.png | Bin 0 -> 130202 bytes _docs/img/ODBC_HbaseView.png | Bin 0 -> 36774 bytes _docs/img/ODBC_HiveConnection.png | Bin 0 -> 33385 bytes _docs/img/ODBC_to_Drillbit.png | Bin 0 -> 6694 bytes _docs/img/ODBC_to_Quorum.png | Bin 0 -> 11684 bytes _docs/img/Parquet_DSN.png | Bin 0 -> 31356 bytes _docs/img/Parquet_Preview.png | Bin 0 -> 78339 bytes _docs/img/RegionParquet_table.png | Bin 0 -> 90698 bytes _docs/img/SelectHbaseView.png | Bin 0 -> 28721 bytes _docs/img/Untitled.png | Bin 0 -> 39796 bytes _docs/img/VoterContributions_hbaseview.png | Bin 0 -> 84225 bytes _docs/img/ngram_plugin.png | Bin 0 -> 51922 bytes _docs/img/ngram_plugin2.png | Bin 0 -> 55418 bytes _docs/img/settings.png | Bin 0 -> 3094 bytes _docs/img/student_hive.png | Bin 0 -> 134755 bytes _docs/install/001-drill-in-10.md | 365 ++++++++++++++++ _docs/install/002-deploy.md | 89 ++++ _docs/install/003-install-embedded.md | 23 + _docs/install/004-install-distributed.md | 55 +++ .../install-embedded/001-install-linux.md | 22 + .../install/install-embedded/002-install-mac.md | 29 ++ .../install/install-embedded/003-install-win.md | 51 +++ _docs/interfaces/001-odbc-win.md | 37 ++ _docs/interfaces/002-odbc-linux.md | 13 + _docs/interfaces/003-jdbc-squirrel.md | 151 +++++++ .../odbc-linux/001-install-odbc-linux.md | 105 +++++ .../odbc-linux/002-install-odbc-mac.md | 70 +++ .../odbc-linux/003-odbc-connections-linux.md | 178 ++++++++ .../odbc-linux/004-odbc-driver-conf.md | 15 + .../odbc-linux/005-odbc-connect-str.md | 23 + .../interfaces/odbc-linux/006-odbc-adv-prop.md | 19 + .../odbc-linux/007-odbc-connections-test.md | 40 ++ .../interfaces/odbc-win/001-install-odbc-win.md | 58 +++ _docs/interfaces/odbc-win/002-conf-odbc-win.md | 143 +++++++ .../interfaces/odbc-win/003-connect-odbc-win.md | 23 + .../interfaces/odbc-win/004-tableau-examples.md | 245 +++++++++++ _docs/interfaces/odbc-win/005-browse-view.md | 49 +++ _docs/manage/001-conf.md | 14 + _docs/manage/002-start-stop.md | 45 ++ _docs/manage/003-ports.md | 9 + _docs/manage/004-partition-prune.md | 75 ++++ _docs/manage/005-monitor-cancel.md | 30 ++ _docs/manage/conf/001-mem-alloc.md | 31 ++ _docs/manage/conf/002-startup-opt.md | 50 +++ _docs/manage/conf/003-plan-exec.md | 37 ++ _docs/manage/conf/004-persist-conf.md | 93 ++++ _docs/progress/001-2014-q1.md | 174 ++++++++ _docs/query/001-query-fs.md | 35 ++ _docs/query/002-query-hbase.md | 151 +++++++ _docs/query/003-query-complex.md | 56 +++ _docs/query/004-query-hive.md | 45 ++ _docs/query/005-query-info-skema.md | 109 +++++ _docs/query/006-query-sys-tbl.md | 159 +++++++ _docs/query/query-complex/001-sample-donuts.md | 40 ++ _docs/query/query-complex/002-query1-select.md | 19 + _docs/query/query-complex/003-query2-use-sql.md | 58 +++ .../query/query-complex/004-query3-sel-nest.md | 45 ++ .../query-complex/005-query4-sel-multiple.md | 24 ++ _docs/query/query-fs/001-query-json.md | 41 ++ _docs/query/query-fs/002-query-parquet.md | 99 +++++ _docs/query/query-fs/003-query-text.md | 119 ++++++ _docs/query/query-fs/004-query-dir.md | 90 ++++ _docs/rn/001-0.5.0rn.md | 29 ++ _docs/rn/002-0.4.0rn.md | 42 ++ _docs/rn/003-alpha-rn.md | 39 ++ _docs/rn/004-0.6.0-rn.md | 32 ++ _docs/rn/005-0.7.0-rn.md | 56 +++ _docs/sql-ref/001-data-types.md | 77 ++++ _docs/sql-ref/002-operators.md | 70 +++ _docs/sql-ref/003-functions.md | 185 ++++++++ _docs/sql-ref/004-nest-functions.md | 10 + _docs/sql-ref/005-cmd-summary.md | 9 + _docs/sql-ref/006-reserved-wds.md | 16 + .../sql-ref/cmd-summary/001-create-table-as.md | 134 ++++++ _docs/sql-ref/cmd-summary/002-explain.md | 166 ++++++++ _docs/sql-ref/cmd-summary/003-select.md | 85 ++++ _docs/sql-ref/cmd-summary/004-show-files.md | 65 +++ _docs/sql-ref/data-types/001-date.md | 148 +++++++ _docs/sql-ref/nested/001-flatten.md | 89 ++++ _docs/sql-ref/nested/002-kvgen.md | 150 +++++++ _docs/sql-ref/nested/003-repeated-cnt.md | 33 ++ _docs/tutorial/001-install-sandbox.md | 33 ++ _docs/tutorial/002-get2kno-sb.md | 232 ++++++++++ _docs/tutorial/003-lesson1.md | 396 +++++++++++++++++ _docs/tutorial/004-lesson2.md | 388 +++++++++++++++++ _docs/tutorial/005-lesson3.md | 379 +++++++++++++++++ _docs/tutorial/006-summary.md | 13 + .../install-sandbox/001-install-mapr-vm.md | 49 +++ .../install-sandbox/002-install-mapr-vb.md | 64 +++ 259 files changed, 9900 insertions(+), 9352 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/.gitignore ---------------------------------------------------------------------- diff --git a/.gitignore b/.gitignore index 1520ec3..bdf6a75 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ _site/* blog/_drafts/* .sass-cache/* +.DS_Store http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/001-arch.md ---------------------------------------------------------------------- diff --git a/_docs/001-arch.md b/_docs/001-arch.md new file mode 100644 index 0000000..0905ad3 --- /dev/null +++ b/_docs/001-arch.md @@ -0,0 +1,49 @@ +--- +title: "Architectural Overview" +--- +Apache Drill is a low latency distributed query engine for large-scale +datasets, including structured and semi-structured/nested data. Inspired by +Googleâs Dremel, Drill is designed to scale to several thousands of nodes and +query petabytes of data at interactive speeds that BI/Analytics environments +require. + +## High-Level Architecture + +Drill includes a distributed execution environment, purpose built for large- +scale data processing. At the core of Apache Drill is the âDrillbitâ service, +which is responsible for accepting requests from the client, processing the +queries, and returning results to the client. + +A Drillbit service can be installed and run on all of the required nodes in a +Hadoop cluster to form a distributed cluster environment. When a Drillbit runs +on each data node in the cluster, Drill can maximize data locality during +query execution without moving data over the network or between nodes. Drill +uses ZooKeeper to maintain cluster membership and health-check information. + +Though Drill works in a Hadoop cluster environment, Drill is not tied to +Hadoop and can run in any distributed cluster environment. The only pre- +requisite for Drill is Zookeeper. + +## Query Flow in Drill + +The following image represents the flow of a Drill query: + +![drill query flow]({{ site.baseurl }}/docs/img/queryFlow.png) + +The flow of a Drill query typically involves the following steps: + + 1. The Drill client issues a query. Any Drillbit in the cluster can accept queries from clients. There is no master-slave concept. + 2. The Drillbit then parses the query, optimizes it, and generates an optimized distributed query plan for fast and efficient execution. + 3. The Drillbit that accepts the query becomes the driving Drillbit node for the request. It gets a list of available Drillbit nodes in the cluster from ZooKeeper. The driving Drillbit determines the appropriate nodes to execute various query plan fragments to maximize data locality. + 4. The Drillbit schedules the execution of query fragments on individual nodes according to the execution plan. + 5. The individual nodes finish their execution and return data to the driving Drillbit. + 6. The driving Drillbit returns results back to the client. + +## Drill Clients + +You can access Drill through the following interfaces: + + * [Drill shell (SQLLine)](/drill/docs/starting-stopping-drill) + * [Drill Web UI](/drill/docs/monitoring-and-canceling-queries-in-the-drill-web-ui) + * [ODBC/JDBC](/drill/docs/odbc-jdbc-interfaces/#using-odbc-to-access-apache-drill-from-bi-tools) + * C++ API \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/001-drill-docs.md ---------------------------------------------------------------------- diff --git a/_docs/001-drill-docs.md b/_docs/001-drill-docs.md deleted file mode 100644 index 382e2e1..0000000 --- a/_docs/001-drill-docs.md +++ /dev/null @@ -1,4 +0,0 @@ ---- -title: "Apache Drill Documentation" ---- -The Drill documentation covers how to install, configure, and use Apache Drill. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/002-tutorial.md ---------------------------------------------------------------------- diff --git a/_docs/002-tutorial.md b/_docs/002-tutorial.md new file mode 100644 index 0000000..14cae80 --- /dev/null +++ b/_docs/002-tutorial.md @@ -0,0 +1,51 @@ +--- +title: "Apache Drill Tutorial" +--- +This tutorial uses the MapR Sandbox, which is a Hadoop environment pre- +configured with Apache Drill. + +To complete the tutorial on the MapR Sandbox with Apache Drill, work through +the following pages in order: + + * [Installing the Apache Drill Sandbox](/drill/docs/installing-the-apache-drill-sandbox) + * [Getting to Know the Drill Setup](/drill/docs/getting-to-know-the-drill-sandbox) + * [Lesson 1: Learn About the Data Set](/drill/docs/lession-1-learn-about-the-data-set) + * [Lesson 2: Run Queries with ANSI SQL](/drill/docs/lession-2-run-queries-with-ansi-sql) + * [Lesson 3: Run Queries on Complex Data Types](/drill/docs/lession-3-run-queries-on-complex-data-types) + * [Summary](/drill/docs/summary) + +## About Apache Drill + +Drill is an Apache open-source SQL query engine for Big Data exploration. +Drill is designed from the ground up to support high-performance analysis on +the semi-structured and rapidly evolving data coming from modern Big Data +applications, while still providing the familiarity and ecosystem of ANSI SQL, +the industry-standard query language. Drill provides plug-and-play integration +with existing Apache Hive and Apache HBase deployments.Apache Drill 0.5 offers +the following key features: + + * Low-latency SQL queries + * Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore. + * ANSI SQL + * Nested data support + * Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs) + * BI/SQL tool integration using standard JDBC/ODBC drivers + +## MapR Sandbox with Apache Drill + +MapR includes Apache Drill as part of the Hadoop distribution. The MapR +Sandbox with Apache Drill is a fully functional single-node cluster that can +be used to get an overview on Apache Drill in a Hadoop environment. Business +and technical analysts, product managers, and developers can use the sandbox +environment to get a feel for the power and capabilities of Apache Drill by +performing various types of queries. Once you get a flavor for the technology, +refer to the [Apache Drill web site](http://incubator.apache.org/drill/) and +[Apache Drill documentation +](/drill/docs)for more +details. + +Note that Hadoop is not a prerequisite for Drill and users can start ramping +up with Drill by running SQL queries directly on the local file system. Refer +to [Apache Drill in 10 minutes](/drill/docs/apache-drill-in-10-minutes) for an introduction to using Drill in local +(embedded) mode. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/003-yelp.md ---------------------------------------------------------------------- diff --git a/_docs/003-yelp.md b/_docs/003-yelp.md new file mode 100644 index 0000000..b65359e --- /dev/null +++ b/_docs/003-yelp.md @@ -0,0 +1,412 @@ +--- +title: "Analyzing Yelp JSON Data with Apache Drill" +--- +[Apache Drill](https://www.mapr.com/products/apache-drill) is one of the +fastest growing open source projects, with the community making rapid progress +with monthly releases. The key difference is Drillâs agility and flexibility. +Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low +latency performance at scale, Drill allows users to analyze the data without +any ETL or up-front schema definitions. The data could be in any file format +such as text, JSON, or Parquet. Data could have simple types such as string, +integer, dates, or more complex multi-structured data, such as nested maps and +arrays. Data can exist in any file system, local or distributed, such as HDFS, +[MapR FS](https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and- +snapshots), or S3. Drill, has a âno schemaâ approach, which enables you to get +value from your data in just a few minutes. + +Letâs quickly walk through the steps required to install Drill and run it +against the Yelp data set. The publicly available data set used for this +example is downloadable from [Yelp](http://www.yelp.com/dataset_challenge) +(business reviews) and is in JSON format. + +## Installing and Starting Drill + +### Step 1: Download Apache Drill onto your local machine + +[http://incubator.apache.org/drill/download/](http://incubator.apache.org/drill/download/) + +You can also [deploy Drill in clustered mode](/drill/docs/deploying-apache-drill-in-a-clustered-environment) if you +want to scale your environment. + +### Step 2 : Open the Drill tar file + + tar -xvf apache-drill-0.6.0-incubating.tar + +### Step 3: Launch sqlline, a JDBC application that ships with Drill + + bin/sqlline -u jdbc:drill:zk=local + +Thatâs it! You are now ready explore the data. + +Letâs try out some SQL examples to understand how Drill makes the raw data +analysis extremely easy. + +**Note**: You need to substitute your local path to the Yelp data set in the FROM clause of each query you run. + +## Querying Data with Drill + +### **1\. View the contents of the Yelp business data** + + 0: jdbc:drill:zk=local> !set maxwidth 10000 + + 0: jdbc:drill:zk=local> select * from + dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` + limit 1; + + +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ + | business_id | full_address | hours | open | categories | city | review_count | name | longitude | state | stars | latitude | attributes | type | neighborhoods | + +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ + | vcNAWiLM4dR7D2nwwJ7nCA | 4840 E Indian School Rd + Ste 101 + Phoenix, AZ 85018 | {"Tuesday":{"close":"17:00","open":"08:00"},"Friday":{"close":"17:00","open":"08:00"},"Monday":{"close":"17:00","open":"08:00"},"Wednesday":{"close":"17:00","open":"08:00"},"Thursday":{"close":"17:00","open":"08:00"},"Sunday":{},"Saturday":{}} | true | ["Doctors","Health & Medical"] | Phoenix | 7 | Eric Goldberg, MD | -111.983758 | AZ | 3.5 | 33.499313 | {"By Appointment Only":true,"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | business | [] | + +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ + +**Note: **You can directly query self-describing files such as JSON, Parquet, and text. There is no need to create metadata definitions in the Hive metastore. + +### **2\. Explore the business data set further** + +#### Total reviews in the data set + + 0: jdbc:drill:zk=local> select sum(review_count) as totalreviews + from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`; + + +--------------+ + | totalreviews | + +--------------+ + | 1236445 | + +--------------+ + +#### Top states and cities in total number of reviews + + 0: jdbc:drill:zk=local> select state, city, count(*) totalreviews + from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` + group by state, city order by count(*) desc limit 10; + + +------------+------------+--------------+ + | state | city | totalreviews | + +------------+------------+--------------+ + | NV | Las Vegas | 12021 | + | AZ | Phoenix | 7499 | + | AZ | Scottsdale | 3605 | + | EDH | Edinburgh | 2804 | + | AZ | Mesa | 2041 | + | AZ | Tempe | 2025 | + | NV | Henderson | 1914 | + | AZ | Chandler | 1637 | + | WI | Madison | 1630 | + | AZ | Glendale | 1196 | + +------------+------------+--------------+ + +#### **Average number of reviews per business star rating** + + 0: jdbc:drill:zk=local> select stars,trunc(avg(review_count)) reviewsavg + from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` + group by stars order by stars desc;`` + + +------------+------------+ + | stars | reviewsavg | + +------------+------------+ + | 5.0 | 8.0 | + | 4.5 | 28.0 | + | 4.0 | 48.0 | + | 3.5 | 35.0 | + | 3.0 | 26.0 | + | 2.5 | 16.0 | + | 2.0 | 11.0 | + | 1.5 | 9.0 | + | 1.0 | 4.0 | + +------------+------------+ + +#### **Top businesses with high review counts (> 1000)** + + 0: jdbc:drill:zk=local> select name, state, city, `review_count` from + dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` + where review_count > 1000 order by `review_count` desc limit 10; + + +------------+------------+------------+----------------------------+ + | name | state | city | review_count | + +------------+------------+------------+----------------------------+ + | Mon Ami Gabi | NV | Las Vegas | 4084 | + | Earl of Sandwich | NV | Las Vegas | 3655 | + | Wicked Spoon | NV | Las Vegas | 3408 | + | The Buffet | NV | Las Vegas | 2791 | + | Serendipity 3 | NV | Las Vegas | 2682 | + | Bouchon | NV | Las Vegas | 2419 | + | The Buffet at Bellagio | NV | Las Vegas | 2404 | + | Bacchanal Buffet | NV | Las Vegas | 2369 | + | The Cosmopolitan of Las Vegas | NV | Las Vegas | 2253 | + | Aria Hotel & Casino | NV | Las Vegas | 2224 | + +------------+------------+------------+----------------------------+ + +#### **Saturday open and close times for a few businesses** + + 0: jdbc:drill:zk=local> select b.name, b.hours.Saturday.`open`, + b.hours.Saturday.`close` + from + dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` + b limit 10; + + +------------+------------+----------------------------+ + | name | EXPR$1 | EXPR$2 | + +------------+------------+----------------------------+ + | Eric Goldberg, MD | 08:00 | 17:00 | + | Pine Cone Restaurant | null | null | + | Deforest Family Restaurant | 06:00 | 22:00 | + | Culver's | 10:30 | 22:00 | + | Chang Jiang Chinese Kitchen| 11:00 | 22:00 | + | Charter Communications | null | null | + | Air Quality Systems | null | null | + | McFarland Public Library | 09:00 | 20:00 | + | Green Lantern Restaurant | 06:00 | 02:00 | + | Spartan Animal Hospital | 07:30 | 18:00 | + +------------+------------+----------------------------+ + +Note how Drill can traverse and refer through multiple levels of nesting. + +### **3\. Get the amenities of each business in the data set** + +Note that the attributes column in the Yelp business data set has a different +element for every row, representing that businesses can have separate +amenities. Drill makes it easy to quickly access data sets with changing +schemas. + +First, change Drill to work in all text mode (so we can take a look at all of +the data). + + 0: jdbc:drill:zk=local> alter system set `store.json.all_text_mode` = true; + +------------+-----------------------------------+ + | ok | summary | + +------------+-----------------------------------+ + | true | store.json.all_text_mode updated. | + +------------+-----------------------------------+ + +Then, query the attributeâs data. + + 0: jdbc:drill:zk=local> select attributes from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` limit 10; + +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | attributes | + +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | {"By Appointment Only":"true","Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"true","dinner":"false","breakfast":"false","brunch":"false"},"Caters":"false","Noise Level":"averag | + | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"false","dinner":"false","breakfast":"false","brunch":"true"},"Caters":"false","Noise Level":"quiet" | + | {"Take-out":"true","Good For":{},"Takes Reservations":"false","Delivery":"false","Ambience":{},"Parking":{"garage":"false","street":"false","validated":"false","lot":"true","val | + | {"Take-out":"true","Good For":{},"Ambience":{},"Parking":{},"Has TV":"false","Outdoor Seating":"false","Attire":"casual","Music":{},"Hair Types Specialized In":{},"Payment Types | + | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Good For":{},"Ambience":{},"Parking":{},"Wi-Fi":"free","Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"false","dinner":"true","breakfast":"false","brunch":"false"},"Noise Level":"average","Takes Reserva | + | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | + +------------+ + +Turn off the all text mode so we can continue to perform arithmetic operations +on data. + + 0: jdbc:drill:zk=local> alter system set `store.json.all_text_mode` = false; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | store.json.all_text_mode updated. | + +### **4\. Explore the restaurant businesses in the data set** + +#### **Number of restaurants in the data set** + + 0: jdbc:drill:zk=local> select count(*) as TotalRestaurants from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants'); + +------------------+ + | TotalRestaurants | + +------------------+ + | 14303 | + +------------------+ + +#### **Top restaurants in number of reviews** + + 0: jdbc:drill:zk=local> select name,state,city,`review_count` from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants') order by `review_count` desc limit 10 + . . . . . . . . . . . > ; + +------------+------------+------------+--------------+ + | name | state | city | review_count | + +------------+------------+------------+--------------+ + | Mon Ami Gabi | NV | Las Vegas | 4084 | + | Earl of Sandwich | NV | Las Vegas | 3655 | + | Wicked Spoon | NV | Las Vegas | 3408 | + | The Buffet | NV | Las Vegas | 2791 | + | Serendipity 3 | NV | Las Vegas | 2682 | + | Bouchon | NV | Las Vegas | 2419 | + | The Buffet at Bellagio | NV | Las Vegas | 2404 | + | Bacchanal Buffet | NV | Las Vegas | 2369 | + | Hash House A Go Go | NV | Las Vegas | 2201 | + | Mesa Grill | NV | Las Vegas | 2004 | + +------------+------------+------------+--------------+ + +**Top restaurants in number of listed categories** + + 0: jdbc:drill:zk=local> select name,repeated_count(categories) as categorycount, categories from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants') order by repeated_count(categories) desc limit 10; + +------------+---------------+------------+ + | name | categorycount | categories | + +------------+---------------+------------+ + | Binion's Hotel & Casino | 10 | ["Arts & Entertainment","Restaurants","Bars","Casinos","Event Planning & Services","Lounges","Nightlife","Hotels & Travel","American (N | + | Stage Deli | 10 | ["Arts & Entertainment","Food","Hotels","Desserts","Delis","Casinos","Sandwiches","Hotels & Travel","Restaurants","Event Planning & Services"] | + | Jillian's | 9 | ["Arts & Entertainment","American (Traditional)","Music Venues","Bars","Dance Clubs","Nightlife","Bowling","Active Life","Restaurants"] | + | Hotel Chocolat | 9 | ["Coffee & Tea","Food","Cafes","Chocolatiers & Shops","Specialty Food","Event Planning & Services","Hotels & Travel","Hotels","Restaurants"] | + | Hotel du Vin & Bistro Edinburgh | 9 | ["Modern European","Bars","French","Wine Bars","Event Planning & Services","Nightlife","Hotels & Travel","Hotels","Restaurants" | + | Elixir | 9 | ["Arts & Entertainment","American (Traditional)","Music Venues","Bars","Cocktail Bars","Nightlife","American (New)","Local Flavor","Restaurants"] | + | Tocasierra Spa and Fitness | 8 | ["Beauty & Spas","Gyms","Medical Spas","Health & Medical","Fitness & Instruction","Active Life","Day Spas","Restaurants"] | + | Costa Del Sol At Sunset Station | 8 | ["Steakhouses","Mexican","Seafood","Event Planning & Services","Hotels & Travel","Italian","Restaurants","Hotels"] | + | Scottsdale Silverado Golf Club | 8 | ["Fashion","Shopping","Sporting Goods","Active Life","Golf","American (New)","Sports Wear","Restaurants"] | + | House of Blues | 8 | ["Arts & Entertainment","Music Venues","Restaurants","Hotels","Event Planning & Services","Hotels & Travel","American (New)","Nightlife"] | + +------------+---------------+------------+ + +#### **Top first categories in number of review counts** + + 0: jdbc:drill:zk=local> select categories[0], count(categories[0]) as categorycount + from dfs.`/users/nrentachintala/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json` + group by categories[0] + order by count(categories[0]) desc limit 10; + +------------+---------------+ + | EXPR$0 | categorycount | + +------------+---------------+ + | Food | 4294 | + | Shopping | 1885 | + | Active Life | 1676 | + | Bars | 1366 | + | Local Services | 1351 | + | Mexican | 1284 | + | Hotels & Travel | 1283 | + | Fast Food | 963 | + | Arts & Entertainment | 906 | + | Hair Salons | 901 | + +------------+---------------+ + +### **5\. Explore the Yelp reviews dataset and combine with the businesses.** + +#### **Take a look at the contents of the Yelp reviews dataset.** + + 0: jdbc:drill:zk=local> select * + from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` limit 1; + +------------+------------+------------+------------+------------+------------+------------+-------------+ + | votes | user_id | review_id | stars | date | text | type | business_id | + +------------+------------+------------+------------+------------+------------+------------+-------------+ + | {"funny":0,"useful":2,"cool":1} | Xqd0DzHaiyRqVH3WRG7hzg | 15SdjuK7DmYqUAj6rjGowg | 5 | 2007-05-17 | dr. goldberg offers everything i look for in a general practitioner. he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first. really, what more do you need? i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank. | review | vcNAWiLM4dR7D2nwwJ7nCA | + +------------+------------+------------+------------+------------+------------+------------+-------------+ + +#### **Top businesses with cool rated reviews** + +Note that we are combining the Yelp business data set that has the overall +review_count to the Yelp review data, which holds additional details on each +of the reviews themselves. + + 0: jdbc:drill:zk=local> Select b.name + from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b + where b.business_id in (SELECT r.business_id + FROM dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r + GROUP BY r.business_id having sum(r.votes.cool) > 2000 + order by sum(r.votes.cool) desc); + +------------+ + | name | + +------------+ + | Earl of Sandwich | + | XS Nightclub | + | The Cosmopolitan of Las Vegas | + | Wicked Spoon | + +------------+ + +**Create a view with the combined business and reviews data sets** + +Note that Drill views are lightweight, and can just be created in the local +file system. Drill in standalone mode comes with a dfs.tmp workspace, which we +can use to create views (or you can can define your own workspaces on a local +or distributed file system). If you want to persist the data physically +instead of in a logical view, you can use CREATE TABLE AS SELECT syntax. + + 0: jdbc:drill:zk=local> create or replace view dfs.tmp.businessreviews as + Select b.name,b.stars,b.state,b.city,r.votes.funny,r.votes.useful,r.votes.cool, r.`date` + from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b, dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r + where r.business_id=b.business_id + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | View 'businessreviews' created successfully in 'dfs.tmp' schema | + +------------+------------+ + +Letâs get the total number of records from the view. + + 0: jdbc:drill:zk=local> select count(*) as Total from dfs.tmp.businessreviews; + +------------+ + | Total | + +------------+ + | 1125458 | + +------------+ + +In addition to these queries, you can get many more deeper insights using +Drillâs [SQL functionality](/drill/docs/sql-reference). If you are not comfortable with writing queries manually, you +can use a BI/Analytics tools such as Tableau/MicroStrategy to query raw +files/Hive/HBase data or Drill-created views directly using Drill [ODBC/JDBC +drivers](/drill/docs/odbc-jdbc-interfaces). + +The goal of Apache Drill is to provide the freedom and flexibility in +exploring data in ways we have never seen before with SQL technologies. The +community is working on more exciting features around nested data and +supporting data with changing schemas in upcoming releases. + +As an example, a new FLATTEN function is in development (an upcoming feature +in 0.7). This function can be used to dynamically rationalize semi-structured +data so you can apply even deeper SQL functionality. Here is a sample query: + +#### **Get a flattened list of categories for each business** + + 0: jdbc:drill:zk=local> select name, flatten(categories) as category + from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` limit 20; + +------------+------------+ + | name | category | + +------------+------------+ + | Eric Goldberg, MD | Doctors | + | Eric Goldberg, MD | Health & Medical | + | Pine Cone Restaurant | Restaurants | + | Deforest Family Restaurant | American (Traditional) | + | Deforest Family Restaurant | Restaurants | + | Culver's | Food | + | Culver's | Ice Cream & Frozen Yogurt | + | Culver's | Fast Food | + | Culver's | Restaurants | + | Chang Jiang Chinese Kitchen | Chinese | + | Chang Jiang Chinese Kitchen | Restaurants | + | Charter Communications | Television Stations | + | Charter Communications | Mass Media | + | Air Quality Systems | Home Services | + | Air Quality Systems | Heating & Air Conditioning/HVAC | + | McFarland Public Library | Libraries | + | McFarland Public Library | Public Services & Government | + | Green Lantern Restaurant | American (Traditional) | + | Green Lantern Restaurant | Restaurants | + | Spartan Animal Hospital | Veterinarians | + +------------+------------+ + +**Top categories used in business reviews** + + 0: jdbc:drill:zk=local> select celltbl.catl, count(celltbl.catl) categorycnt + from (select flatten(categories) catl from dfs.`/users/nrentachintala/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json` ) celltbl + group by celltbl.catl + order by count(celltbl.catl) desc limit 10 ; + +------------+-------------+ + | catl | categorycnt | + +------------+-------------+ + | Restaurants | 14303 | + | Shopping | 6428 | + | Food | 5209 | + | Beauty & Spas | 3421 | + | Nightlife | 2870 | + | Bars | 2378 | + | Health & Medical | 2351 | + | Automotive | 2241 | + | Home Services | 1957 | + | Fashion | 1897 | + +------------+-------------+ + +Stay tuned for more features and upcoming activities in the Drill community. + +To learn more about Drill, please refer to the following resources: + + * Download Drill here:<http://incubator.apache.org/drill/download/> + * 10 reasons we think Drill is cool:<http://incubator.apache.org/drill/why-drill/> + * [A simple 10-minute tutorial](/drill/docs/apache-drill-in-10-minutes>) + * [A more comprehensive tutorial](/drill/docs/apache-drill-tutorial) + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/004-install.md ---------------------------------------------------------------------- diff --git a/_docs/004-install.md b/_docs/004-install.md new file mode 100644 index 0000000..9dbfdc4 --- /dev/null +++ b/_docs/004-install.md @@ -0,0 +1,13 @@ +--- +title: "Install Drill" +--- +You can install Drill in embedded mode or in distributed mode. Installing +Drill in embedded mode does not require any configuration, which means that +you can quickly get started with Drill. If you want to use Drill in a +clustered Hadoop environment, you can install Drill in distributed mode. +Installing in distributed mode requires some configuration, however once you +install you can connect Drill to your Hive, HBase, or distributed file system +data sources and run queries on them. + + + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/005-connect.md ---------------------------------------------------------------------- diff --git a/_docs/005-connect.md b/_docs/005-connect.md new file mode 100644 index 0000000..b48d200 --- /dev/null +++ b/_docs/005-connect.md @@ -0,0 +1,41 @@ +--- +title: "Connect to Data Sources" +--- +Apache Drill serves as a query layer that connects to data sources through +storage plugins. Drill uses the storage plugins to interact with data sources. +You can think of a storage plugin as a connection between Drill and a data +source. + +The following image represents the storage plugin layer between Drill and a +data source: + +![drill query flow]({{ site.baseurl }}/docs/img/storageplugin.png) + +Storage plugins provide the following information to Drill: + + * Metadata available in the underlying data source + * Location of data + * Interfaces that Drill can use to read from and write to data sources + * A set of storage plugin optimization rules that assist with efficient and faster execution of Drill queries, such as pushdowns, statistics, and partition awareness + +Storage plugins perform scanner and writer functions, and inform the metadata +repository of any known metadata, such as: + + * Schema + * File size + * Data ordering + * Secondary indices + * Number of blocks + +Storage plugins inform the execution engine of any native capabilities, such +as predicate pushdown, joins, and SQL. + +Drill provides storage plugins for files and HBase/M7. Drill also integrates +with Hive through a storage plugin. Hive provides a metadata abstraction layer +on top of files and HBase/M7. + +When you run Drill to query files in HBase/M7, Drill can perform direct +queries on the data or go through Hive, if you have metadata defined there. +Drill integrates with the Hive metastore for metadata and also uses a Hive +SerDe for the deserialization of records. Drill does not invoke the Hive +execution engine for any requests. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/006-interfaces.md ---------------------------------------------------------------------- diff --git a/_docs/006-interfaces.md b/_docs/006-interfaces.md new file mode 100644 index 0000000..ce068a6 --- /dev/null +++ b/_docs/006-interfaces.md @@ -0,0 +1,50 @@ +--- +title: "ODBC/JDBC Interfaces" +--- +You can connect to Apache Drill through the following interfaces: + + * Drill shell (SQLLine) + * Drill Web UI + * [ODBC](/drill/docs/odbc-jdbc-interfaces#using-odbc-to-access-apache-drill-from-bi-tools)* + * [JDBC](/drill/docs/odbc-jdbc-interfaces#using-jdbc-to-access-apache-drill-from-squirrel) + * C++ API + +*Apache Drill does not have an open source ODBC driver. However, MapR provides an ODBC driver that you can use to connect to Apache Drill from BI tools. + +## Using ODBC to Access Apache Drill from BI Tools + +MapR provides ODBC drivers for Windows, Mac OS X, and Linux. It is recommended +that you install the latest version of Apache Drill with the latest version of +the Drill ODBC driver. + +For example, if you have Apache Drill 0.5 and a Drill ODBC driver installed on +your machine, and then you upgrade to Apache Drill 0.6, do not assume that the +Drill ODBC driver installed on your machine will work with the new version of +Apache Drill. Install the latest available Drill ODBC driver to ensure that +the two components work together. + +You can access the latest Drill ODBC drivers in the following location: + +<http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc> + +## Using JDBC to Access Apache Drill from SQuirrel + +You can connect to Drill through a JDBC client tool, such as SQuirreL, on +Windows, Linux, and Mac OS X systems, to access all of your data sources +registered with Drill. An embedded JDBC driver is included with Drill. +Configure the JDBC driver in the SQuirreL client to connect to Drill from +SQuirreL. This document provides instruction for connecting to Drill from +SQuirreL on Windows. + +To use the Drill JDBC driver with SQuirreL on Windows, complete the following +steps: + + * [Step 1: Getting the Drill JDBC Driver](/drill/docs/using-the-jdbc-driver#step-1-getting-the-drill-jdbc-driver) + * [Step 2: Installing and Starting SQuirreL](/drill/docs/using-the-jdbc-driver#step-2-installing-and-starting-squirrel) + * [Step 3: Adding the Drill JDBC Driver to SQuirreL](/drill/docs/using-the-jdbc-driver#step-3-adding-the-drill-jdbc-driver-to-squirrel) + * [Step 4: Running a Drill Query from SQuirreL](/drill/docs/using-the-jdbc-driver#step-4-running-a-drill-query-from-squirrel) + +For information about how to use SQuirreL, refer to the [SQuirreL Quick +Start](http://squirrel-sql.sourceforge.net/user-manual/quick_start.html) +guide. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/007-query.md ---------------------------------------------------------------------- diff --git a/_docs/007-query.md b/_docs/007-query.md new file mode 100644 index 0000000..bf58f0b --- /dev/null +++ b/_docs/007-query.md @@ -0,0 +1,41 @@ +--- +title: "Query Data" +--- +You can query local and distributed file systems, Hive, and HBase data sources +registered with Drill. If you connected directly to a particular schema when +you invoked SQLLine, you can issue SQL queries against that schema. If you did +not indicate a schema when you invoked SQLLine, you can issue the `USE +<schema>` statement to run your queries against a particular schema. After you +issue the `USE` statement, you can use absolute notation, such as `schema.table.column`. + +You may need to use casting functions in some queries. For example, you may +have to cast a string `"100"` to an integer in order to apply a math function +or an aggregate function. + +You can use the EXPLAIN command to analyze errors and troubleshoot queries +that do not run. For example, if you run into a casting error, the query plan +text may help you isolate the problem. + + 0: jdbc:drill:zk=local> !set maxwidth 10000 + 0: jdbc:drill:zk=local> explain plan for select ... ; + +The set command increases the default text display (number of characters). By +default, most of the plan output is hidden. + +You may see errors if you try to use non-standard or unsupported SQL syntax in +a query. + +Remember the following tips when querying data with Drill: + + * Include a semicolon at the end of SQL statements, except when you issue a command with an exclamation point `(!). + `Example: `!set maxwidth 10000` + * Use backticks around file and directory names that contain special characters and also around reserved words when you query a file system. + The following special characters require backticks: + + * . (period) + * / (forward slash) + * _ (underscore) + Example: ``SELECT * FROM dfs.default.`sample_data/my_sample.json`; `` + * `CAST` data to `VARCHAR` if an expression in a query returns `VARBINARY` as the result type in order to view the `VARBINARY` types as readable data. If you do not use the `CAST` function, Drill returns the results as byte data. + Example: `CAST (VARBINARY_expr as VARCHAR(50))` + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/008-sql-ref.md ---------------------------------------------------------------------- diff --git a/_docs/008-sql-ref.md b/_docs/008-sql-ref.md new file mode 100644 index 0000000..81bcbab --- /dev/null +++ b/_docs/008-sql-ref.md @@ -0,0 +1,14 @@ +--- +title: "SQL Reference" +--- +Drill supports the ANSI standard for SQL. You can use SQL to query your Hive, +HBase, and distributed file system data sources. Drill can discover the form +of the data when you submit a query. You can query text files and nested data +formats, such as JSON and Parquet. Drill provides special operators and +functions that you can use to _drill down _into nested data formats. + +Drill queries do not require information about the data that you are trying to +access, regardless of its source system or its schema and data types. The +sweet spot for Apache Drill is a SQL query workload against "complex data": +data made up of various types of records and fields, rather than data in a +recognizable relational form (discrete rows and columns). http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/009-dev-custom-func.md ---------------------------------------------------------------------- diff --git a/_docs/009-dev-custom-func.md b/_docs/009-dev-custom-func.md new file mode 100644 index 0000000..f8a6445 --- /dev/null +++ b/_docs/009-dev-custom-func.md @@ -0,0 +1,37 @@ +--- +title: "Develop Custom Functions" +--- + +Drill provides a high performance Java API with interfaces that you can +implement to develop simple and aggregate custom functions. Custom functions +are reusable SQL functions that you develop in Java to encapsulate code that +processes column values during a query. Custom functions can perform +calculations and transformations that built-in SQL operators and functions do +not provide. Custom functions are called from within a SQL statement, like a +regular function, and return a single value. + +## Simple Function + +A simple function operates on a single row and produces a single row as the +output. When you include a simple function in a query, the function is called +once for each row in the result set. Mathematical and string functions are +examples of simple functions. + +## Aggregate Function + +Aggregate functions differ from simple functions in the number of rows that +they accept as input. An aggregate function operates on multiple input rows +and produces a single row as output. The COUNT(), MAX(), SUM(), and AVG() +functions are examples of aggregate functions. You can use an aggregate +function in a query with a GROUP BY clause to produce a result set with a +separate aggregate value for each combination of values from the GROUP BY +clause. + +## Process + +To develop custom functions that you can use in your Drill queries, you must +complete the following tasks: + + 1. Create a Java program that implements Drillâs simple or aggregate interface, and compile a sources and a classes JAR file. + 2. Add the sources and classes JAR files to Drillâs classpath. + 3. Add the name of the package that contains the classes to Drillâs main configuration file, drill-override.conf. http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/010-manage.md ---------------------------------------------------------------------- diff --git a/_docs/010-manage.md b/_docs/010-manage.md new file mode 100644 index 0000000..ec6663b --- /dev/null +++ b/_docs/010-manage.md @@ -0,0 +1,14 @@ +--- +title: "Manage Drill" +--- +When using Drill, you may need to stop and restart a Drillbit on a node, or +modify various options. For example, the default storage format for CTAS +statements is Parquet. You can modify the default setting so that output data +is stored in CSV or JSON format. + +You can use certain SQL commands to manage Drill from within the Drill shell +(SQLLine). You can also modify Drill configuration options, such as memory +allocation, in Drill's configuration files. + + + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/011-develop.md ---------------------------------------------------------------------- diff --git a/_docs/011-develop.md b/_docs/011-develop.md new file mode 100644 index 0000000..2b9ce67 --- /dev/null +++ b/_docs/011-develop.md @@ -0,0 +1,9 @@ +--- +title: "Develop Drill" +--- +To develop Drill, you compile Drill from source code and then set up a project +in Eclipse for use as your development environment. To review or contribute to +Drill code, you must complete the steps required to install and use the Drill +patch review tool. + + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/012-rn.md ---------------------------------------------------------------------- diff --git a/_docs/012-rn.md b/_docs/012-rn.md new file mode 100644 index 0000000..f369335 --- /dev/null +++ b/_docs/012-rn.md @@ -0,0 +1,191 @@ +--- +title: "Release Notes" +--- +## Apache Drill 0.7.0 Release Notes + +Apache Drill 0.7.0, the third beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +This release is available as +[binary](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- +drill-0.7.0.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- +drill-0.7.0-src.tar.gz) tarballs that are compiled against Apache Hadoop. +Drill has been tested against MapR, Cloudera, and Hortonworks Hadoop +distributions. There are associated build profiles and JIRAs that can help you +run Drill against your preferred distribution + +### Apache Drill 0.7.0 Key Features + + * No more dependency on UDP/Multicast - Making it possible for Drill to work well in the following scenarios: + + * UDP multicast not enabled (as in EC2) + + * Cluster spans multiple subnets + + * Cluster has multihome configuration + + * New functions to natively work with nested data - KVGen and Flatten + + * Support for Hive 0.13 (Hive 0.12 with Drill is not supported any more) + + * Improved performance when querying Hive tables and File system through partition pruning + + * Improved performance for HBase with LIKE operator pushdown + + * Improved memory management + + * Drill web UI monitoring and query profile improvements + + * Ability to parse files without explicit extensions using default storage format specification + + * Fixes for dealing with complex/nested data objects in Parquet/JSON + + * Fast schema return - Improved experience working with BI/query tools by returning metadata quickly + + * Several hang related fixes + + * Parquet writer fixes for handling large datasets + + * Stability improvements in ODBC and JDBC drivers + +### Apache Drill 0.7.0 Key Notes and Limitations + + * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. + +## Apache Drill 0.6.0 Release Notes + +Apache Drill 0.6.0, the second beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc +ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu +bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled +against Apache Hadoop. Drill has been tested against MapR, Cloudera, and +Hortonworks Hadoop distributions. There are associated build profiles and +JIRAs that can help you run Drill against your preferred distribution. + +### Apache Drill 0.6.0 Key Features + +This release is primarily a bug fix release, with [more than 30 JIRAs closed]( +https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&vers +ion=12327472), but there are some notable features: + + * Direct ANSI SQL access to MongoDB, using the latest [MongoDB Plugin for Apache Drill](/drill/docs/mongodb-plugin-for-apache-drill) + * Filesystem query performance improvements with partition pruning + * Ability to use the file system as a persistent store for query profiles and diagnostic information + * Window function support (alpha) + +### Apache Drill 0.6.0 Key Notes and Limitations + + * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. + +## Apache Drill 0.5.0 Release Notes + +Apache Drill 0.5.0, the first beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +The 0.5.0 release is primarily a bug fix release, with [more than 100 JIRAs](h +ttps://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&versi +on=12324880) closed, but there are some notable features. For information +about the features, see the [Apache Drill Blog for the 0.5.0 +release](https://blogs.apache.org/drill/entry/apache_drill_beta_release_see). + +This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc +ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu +bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled +against Apache Hadoop. Drill has been tested against MapR, Cloudera, and +Hortonworks Hadoop distributions. There are associated build profiles and +JIRAs that can help you run Drill against your preferred distribution. + +### Apache Drill 0.5.0 Key Notes and Limitations + + * The current release supports in memory and beyond memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Others operations, such as streaming aggregate, may have partial support that leads to unexpected results. + * There are known issues with joining text files without using an intervening view. See [DRILL-1401](https://issues.apache.org/jira/browse/DRILL-1401) for more information. + +## Apache Drill 0.4.0 Release Notes + +The 0.4.0 release is a developer preview release, designed to help enthusiasts +start to work with and experiment with Drill. It is the first Drill release +that provides distributed query execution. + +This release is built upon [more than 800 +JIRAs](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324963/). +It is a pre-beta release on the way towards Drill. As a developer snapshot, +the release contains a large number of outstanding bugs that will make some +use cases challenging. Feel free to consult outstanding issues [targeted for +the 0.5.0 +release](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324880/) +to see whether your use case is affected. + +To read more about this release and new features introduced, please view the +[0.4.0 announcement blog +entry](https://blogs.apache.org/drill/entry/announcing_apache_drill_0_4). + +The release is available as both [binary](http://www.apache.org/dyn/closer.cgi +/incubator/drill/drill-0.4.0-incubating/apache-drill-0.4.0-incubating.tar.gz) +and [source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.4.0- +incubating/apache-drill-0.4.0-incubating-src.tar.gz) tarballs. In both cases, +these are compiled against Apache Hadoop. Drill has also been tested against +MapR, Cloudera and Hortonworks Hadoop distributions and there are associated +build profiles or JIRAs that can help you run against your preferred +distribution. + +### Some Key Notes & Limitations + + * The current release supports in memory and beyond memory execution. However, users must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * In many cases,merge join operations return incorrect results. + * Use of a local filter in a join âonâ clause when using left, right or full outer joins may result in incorrect results. + * Because of known memory leaks and memory overrun issues you may need more memory and you may need to restart the system in some cases. + * Some types of complex expressions, especially those involving empty arrays may fail or return incorrect results. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior (such as Sort). Others operations (such as streaming aggregate) may have partial support that leads to unexpected results. + * Protobuf, UDF, query plan interfaces and all interfaces are subject to change in incompatible ways. + * Multiplication of some types of DECIMAL(28+,*) will return incorrect result. + +## Apache Drill M1 -- Release Notes (Apache Drill Alpha) + +### Milestone 1 Goals + +The first release of Apache Drill is designed as a technology preview for +people to better understand the architecture and vision. It is a functional +release tying to piece together the key components of a next generation MPP +query engine. It is designed to allow milestone 2 (M2) to focus on +architectural analysis and performance optimization. + + * Provide a new optimistic DAG execution engine for data analysis + * Build a new columnar shredded in-memory format and execution model that minimizes data serialization/deserialization costs and operator complexity + * Provide a model for runtime generated functions and relational operators that minimizes complexity and maximizes performance + * Support queries against columnar on disk format (Parquet) and JSON + * Support the most common set of standard SQL read-only phrases using ANSI standards. Includes: SELECT, FROM, WHERE, HAVING, ORDER, GROUP BY, IN, DISTINCT, LEFT JOIN, RIGHT JOIN, INNER JOIN + * Support schema-on-read querying and execution + * Build a set of columnar operation primitives including Merge Join, Sort, Streaming Aggregate, Filter, Selection Vector removal. + * Support unlimited level of subqueries and correlated subqueries + * Provided an extensible query-language agnostic JSON-base logical data flow syntax. + * Support complex data type manipulation via logical plan operations + +### Known Issues + +SQL Parsing +Because Apache Drill is built to support late-bound changing schemas while SQL +is statically typed, there are couple of special requirements that are +required writing SQL queries. These are limited to the current release and +will be correct in a future milestone release. + + * All tables are exposed as a single map field that contains + * Drill Alpha doesn't support implicit or explicit casts outside those required above. + * Drill Alpha does not include, there are currently a couple of differences for how to write a query in In order to query against + +### UDFs + + * Drill currently supports simple and aggregate functions using scalar, repeated and + * Nested data support incomplete. Drill Alpha supports nested data structures as well repeated fields. However, + * asd + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/013-contribute.md ---------------------------------------------------------------------- diff --git a/_docs/013-contribute.md b/_docs/013-contribute.md new file mode 100644 index 0000000..33db231 --- /dev/null +++ b/_docs/013-contribute.md @@ -0,0 +1,9 @@ +--- +title: "Contribute to Drill" +--- +The Apache Drill community welcomes your support. Please read [Apache Drill +Contribution Guidelines](/drill/docs/apache-drill-contribution-guidelines) for information about how to contribute to +the project. If you would like to contribute to the project and need some +ideas for what to do, please read [Apache Drill Contribution +Ideas](/drill/docs/apache-drill-contribution-ideas). + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/014-sample-ds.md ---------------------------------------------------------------------- diff --git a/_docs/014-sample-ds.md b/_docs/014-sample-ds.md new file mode 100644 index 0000000..7212ea0 --- /dev/null +++ b/_docs/014-sample-ds.md @@ -0,0 +1,10 @@ +--- +title: "Sample Datasets" +--- +Use any of the following sample datasets provided to test Drill: + + * [AOL Search](/drill/docs/aol-search) + * [Enron Emails](/drill/docs/enron-emails) + * [Wikipedia Edit History](/drill/docs/wikipedia-edit-history) + + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/015-design.md ---------------------------------------------------------------------- diff --git a/_docs/015-design.md b/_docs/015-design.md new file mode 100644 index 0000000..00b17e5 --- /dev/null +++ b/_docs/015-design.md @@ -0,0 +1,13 @@ +--- +title: "Design Docs" +--- +Review the Apache Drill design docs for early descriptions of Apache Drill +functionality, terms, and goals, and reference the research articles to learn +about Apache Drill's history: + + * [Drill Plan Syntax](/drill/docs/drill-plan-syntax) + * [RPC Overview](/drill/docs/rpc-overview) + * [Query Stages](/drill/docs/query-stages) + * [Useful Research](/drill/docs/useful-research) + * [Value Vectors](/drill/docs/value-vectors) + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/016-progress.md ---------------------------------------------------------------------- diff --git a/_docs/016-progress.md b/_docs/016-progress.md new file mode 100644 index 0000000..bf19a29 --- /dev/null +++ b/_docs/016-progress.md @@ -0,0 +1,8 @@ +--- +title: "Progress Reports" +--- +Review the following Apache Drill progress reports for a summary of issues, +progression of the project, summary of mailing list discussions, and events: + + * [2014 Q1 Drill Report](/drill/docs/2014-q1-drill-report) + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/017-archived-pages.md ---------------------------------------------------------------------- diff --git a/_docs/017-archived-pages.md b/_docs/017-archived-pages.md new file mode 100644 index 0000000..d052579 --- /dev/null +++ b/_docs/017-archived-pages.md @@ -0,0 +1,8 @@ +--- +title: "Archived Pages" +--- +The following pages have been archived: + +* How to Run Drill with Sample Data +* Meet Apache Drill + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/018-bylaws.md ---------------------------------------------------------------------- diff --git a/_docs/018-bylaws.md b/_docs/018-bylaws.md new file mode 100644 index 0000000..2c35042 --- /dev/null +++ b/_docs/018-bylaws.md @@ -0,0 +1,170 @@ +--- +title: "Project Bylaws" +--- +## Introduction + +This document defines the bylaws under which the Apache Drill project +operates. It defines the roles and responsibilities of the project, who may +vote, how voting works, how conflicts are resolved, etc. + +Drill is a project of the [Apache Software +Foundation](http://www.apache.org/foundation/). The foundation holds the +copyright on Apache code including the code in the Drill codebase. The +[foundation FAQ](http://www.apache.org/foundation/faq.html) explains the +operation and background of the foundation. + +Drill is typical of Apache projects in that it operates under a set of +principles, known collectively as the _Apache Way_. If you are new to Apache +development, please refer to the [Incubator +project](http://incubator.apache.org/) for more information on how Apache +projects operate. + +## Roles and Responsibilities + +Apache projects define a set of roles with associated rights and +responsibilities. These roles govern what tasks an individual may perform +within the project. The roles are defined in the following sections. + +### Users + +The most important participants in the project are people who use our +software. The majority of our contributors start out as users and guide their +development efforts from the user's perspective. + +Users contribute to the Apache projects by providing feedback to contributors +in the form of bug reports and feature suggestions. As well, users participate +in the Apache community by helping other users on mailing lists and user +support forums. + +### Contributors + +All of the volunteers who are contributing time, code, documentation, or +resources to the Drill Project. A contributor that makes sustained, welcome +contributions to the project may be invited to become a committer, though the +exact timing of such invitations depends on many factors. + +### Committers + +The project's committers are responsible for the project's technical +management. Committers have access to a specified set of subproject's code +repositories. Committers on subprojects may cast binding votes on any +technical discussion regarding that subproject. + +Committer access is by invitation only and must be approved by lazy consensus +of the active PMC members. A Committer is considered _emeritus_ by his or her +own declaration or by not contributing in any form to the project for over six +months. An emeritus committer may request reinstatement of commit access from +the PMC which will be sufficient to restore him or her to active committer +status. + +Commit access can be revoked by a unanimous vote of all the active PMC members +(except the committer in question if he or she is also a PMC member). + +All Apache committers are required to have a signed [Contributor License +Agreement (CLA)](http://www.apache.org/licenses/icla.txt) on file with the +Apache Software Foundation. There is a [Committer +FAQ](http://www.apache.org/dev/committers.html) which provides more details on +the requirements for committers. + +A committer who makes a sustained contribution to the project may be invited +to become a member of the PMC. The form of contribution is not limited to +code. It can also include code review, helping out users on the mailing lists, +documentation, etc. + +### Project Management Committee + +The PMC is responsible to the board and the ASF for the management and +oversight of the Apache Drill codebase. The responsibilities of the PMC +include + + * Deciding what is distributed as products of the Apache Drill project. In particular all releases must be approved by the PMC. + * Maintaining the project's shared resources, including the codebase repository, mailing lists, websites. + * Speaking on behalf of the project. + * Resolving license disputes regarding products of the project. + * Nominating new PMC members and committers. + * Maintaining these bylaws and other guidelines of the project. + +Membership of the PMC is by invitation only and must be approved by a lazy +consensus of active PMC members. A PMC member is considered _emeritus_ by his +or her own declaration or by not contributing in any form to the project for +over six months. An emeritus member may request reinstatement to the PMC, +which will be sufficient to restore him or her to active PMC member. + +Membership of the PMC can be revoked by an unanimous vote of all the active +PMC members other than the member in question. + +The chair of the PMC is appointed by the ASF board. The chair is an office +holder of the Apache Software Foundation (Vice President, Apache Drill) and +has primary responsibility to the board for the management of the projects +within the scope of the Drill PMC. The chair reports to the board quarterly on +developments within the Drill project. + +The term of the chair is one year. When the current chair's term is up or if +the chair resigns before the end of his or her term, the PMC votes to +recommend a new chair using lazy consensus, but the decision must be ratified +by the Apache board. + +## Decision Making + +Within the Drill project, different types of decisions require different forms +of approval. For example, the previous section describes several decisions +which require 'lazy consensus' approval. This section defines how voting is +performed, the types of approvals, and which types of decision require which +type of approval. + +### Voting + +Decisions regarding the project are made by votes on the primary project +development mailing list +_[d...@drill.apache.org](mailto:d...@drill.apache.org)_. Where necessary, PMC +voting may take place on the private Drill PMC mailing list +[priv...@drill.apache.org](mailto:priv...@drill.apache.org). Votes are clearly +indicated by subject line starting with [VOTE]. Votes may contain multiple +items for approval and these should be clearly separated. Voting is carried +out by replying to the vote mail. Voting may take four flavors. + + <table ><tbody><tr><td valign="top" >Vote</td><td valign="top" > </td></tr><tr><td valign="top" >+1</td><td valign="top" >'Yes,' 'Agree,' or 'the action should be performed.' In general, this vote also indicates a willingness on the behalf of the voter in 'making it happen'.</td></tr><tr><td valign="top" >+0</td><td valign="top" >This vote indicates a willingness for the action under consideration to go ahead. The voter, however will not be able to help.</td></tr><tr><td valign="top" >-0</td><td valign="top" >This vote indicates that the voter does not, in general, agree with the proposed action but is not concerned enough to prevent the action going ahead.</td></tr><tr><td valign="top" >-1</td><td valign="top" >This is a negative vote. On issues where consensus is required, this vote counts as a <strong>veto</strong>. All vetoes must contain an explanation of why the veto is appropriate. Vetoes with no explanation are void. It may also be appropriate for a -1 vote to include an al ternative course of action.</td></tr></tbody></table> + +All participants in the Drill project are encouraged to show their agreement +with or against a particular action by voting. For technical decisions, only +the votes of active committers are binding. Non binding votes are still useful +for those with binding votes to understand the perception of an action in the +wider Drill community. For PMC decisions, only the votes of PMC members are +binding. + +Voting can also be applied to changes already made to the Drill codebase. +These typically take the form of a veto (-1) in reply to the commit message +sent when the commit is made. Note that this should be a rare occurrence. All +efforts should be made to discuss issues when they are still patches before +the code is committed. + +### Approvals + +These are the types of approvals that can be sought. Different actions require +different types of approvals. + +<table ><tbody><tr><td valign="top" >Approval Type</td><td valign="top" > </td></tr><tr><td valign="top" >Consensus</td><td valign="top" >For this to pass, all voters with binding votes must vote and there can be no binding vetoes (-1). Consensus votes are rarely required due to the impracticality of getting all eligible voters to cast a vote.</td></tr><tr><td valign="top" >Lazy Consensus</td><td valign="top" >Lazy consensus requires 3 binding +1 votes and no binding vetoes.</td></tr><tr><td valign="top" >Lazy Majority</td><td valign="top" >A lazy majority vote requires 3 binding +1 votes and more binding +1 votes that -1 votes.</td></tr><tr><td valign="top" >Lazy Approval</td><td valign="top" >An action with lazy approval is implicitly allowed unless a -1 vote is received, at which time, depending on the type of action, either lazy majority or lazy consensus approval must be obtained.</td></tr></tbody></table> + +### Vetoes + +A valid, binding veto cannot be overruled. If a veto is cast, it must be +accompanied by a valid reason explaining the reasons for the veto. The +validity of a veto, if challenged, can be confirmed by anyone who has a +binding vote. This does not necessarily signify agreement with the veto - +merely that the veto is valid. + +If you disagree with a valid veto, you must lobby the person casting the veto +to withdraw his or her veto. If a veto is not withdrawn, the action that has +been vetoed must be reversed in a timely manner. + +### Actions + +This section describes the various actions which are undertaken within the +project, the corresponding approval required for that action and those who +have binding votes over the action. It also specifies the minimum length of +time that a vote must remain open, measured in business days. In general votes +should not be called at times when it is known that interested members of the +project will be unavailable. + +<table ><tbody><tr><td valign="top" >Action</td><td valign="top" >Description</td><td valign="top" >Approval</td><td valign="top" >Binding Votes</td><td valign="top" >Minimum Length</td></tr><tr><td valign="top" >Code Change</td><td valign="top" >A change made to a codebase of the project and committed by a committer. This includes source code, documentation, website content, etc.</td><td valign="top" >Consensus approval of active committers, with a minimum of one +1. The code can be committed after the first +1</td><td valign="top" >Active committers</td><td valign="top" >1</td></tr><tr><td valign="top" >Release Plan</td><td valign="top" >Defines the timetable and actions for a release. The plan also nominates a Release Manager.</td><td valign="top" >Lazy majority</td><td valign="top" >Active committers</td><td valign="top" >3</td></tr><tr><td valign="top" >Product Release</td><td valign="top" >When a release of one of the project's products is ready, a vote is required to accept t he release as an official release of the project.</td><td valign="top" >Lazy Majority</td><td valign="top" >Active PMC members</td><td valign="top" >3</td></tr><tr><td valign="top" >Adoption of New Codebase</td><td valign="top" >When the codebase for an existing, released product is to be replaced with an alternative codebase. If such a vote fails to gain approval, the existing code base will continue. This also covers the creation of new sub-projects within the project.</td><td valign="top" >2/3 majority</td><td valign="top" >Active PMC members</td><td valign="top" >6</td></tr><tr><td valign="top" >New Committer</td><td valign="top" >When a new committer is proposed for the project.</td><td valign="top" >Lazy consensus</td><td valign="top" >Active PMC members</td><td valign="top" >3</td></tr><tr><td valign="top" >New PMC Member</td><td valign="top" >When a committer is proposed for the PMC.</td><td valign="top" >Lazy consensus</td><td valign="top" >Active PMC members</td><td valign ="top" >3</td></tr><tr><td valign="top" >Committer Removal</td><td valign="top" >When removal of commit privileges is sought. <em>Note: Such actions will also be referred to the ASF board by the PMC chair.</em></td><td valign="top" >Consensus</td><td valign="top" >Active PMC members (excluding the committer in question if a member of the PMC).</td><td valign="top" >6</td></tr><tr><td valign="top" >PMC Member Removal</td><td valign="top" >When removal of a PMC member is sought. <em>Note: Such actions will also be referred to the ASF board by the PMC chair.</em></td><td valign="top" >Consensus</td><td valign="top" >Active PMC members (excluding the member in question).</td><td valign="top" >6</td></tr><tr><td valign="top" >Modifying Bylaws</td><td valign="top" >Modifying this document.</td><td valign="top" >2/3 majority</td><td valign="top" >Active PMC members</td><td valign="top" >6</td></tr></tbody></table> +