remove old Drill refs, finish config options
Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/c38e6a18 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/c38e6a18 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/c38e6a18 Branch: refs/heads/gh-pages Commit: c38e6a1886aa9ed6c9ff277ada028ff10d53c043 Parents: 0640c9a Author: Kristine Hahn <[email protected]> Authored: Thu May 14 12:47:53 2015 -0700 Committer: Kristine Hahn <[email protected]> Committed: Thu May 14 12:47:53 2015 -0700 ---------------------------------------------------------------------- .../010-configuration-options-introduction.md | 32 ++++++++++---------- .../080-drill-default-input-format.md | 2 +- .../030-deploying-and-using-a-hive-udf.md | 2 +- .../050-json-data-model.md | 4 +-- .../010-interfaces-introduction.md | 4 +-- ...microstrategy-analytics-with-apache-drill.md | 9 +++--- .../050-using-drill-explorer-on-windows.md | 4 +-- .../030-analyzing-the-yelp-academic-dataset.md | 5 ++- .../040-learn-drill-with-the-mapr-sandbox.md | 19 +----------- .../010-installing-the-apache-drill-sandbox.md | 2 +- 10 files changed, 32 insertions(+), 51 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md ---------------------------------------------------------------------- diff --git a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md index 587c698..4fd7948 100644 --- a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md +++ b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md @@ -3,16 +3,16 @@ title: "Configuration Options Introduction" parent: "Configuration Options" --- Drill provides many configuration options that you can enable, disable, or -modify. Modifying certain configuration options can impact Drillâs -performance. Many of Drill's configuration options reside in the `drill- -env.sh` and `drill-override.conf` files. Drill stores these files in the +modify. Modifying certain configuration options can impact Drill +performance. Many of configuration options reside in the `drill- +env.sh` and `drill-override.conf` files in the `/conf` directory. Drill sources` /etc/drill/conf` if it exists. Otherwise, Drill sources the local `<drill_installation_directory>/conf` directory. -The sys.options table in Drill contains information about boot (start-up) and system options. The section, ["Start-up Options"]({{site.baseurl}}/docs/start-up-options), covers how to configure and view key boot options. The sys.options table also contains many system options, some of which are described in detail the section, ["Planning and Execution Options"]({{site.baseurl}}/docs/planning-and-execution-options). The following table lists the options in alphabetical order and provides a brief description of supported options: +The sys.options table contains information about boot (start-up), system, and session options. The section, ["Start-up Options"]({{site.baseurl}}/docs/start-up-options), covers how to configure and view key boot options. The following table lists the options in alphabetical order and provides a brief description of supported options: ## System Options -The sys.options table lists the following options that you can set as a system or session option as described in the section, ["Planning and Execution Options"]({{site.baseurl}}/docs/planning-and-execution-options) +The sys.options table lists the following options that you can set as a system or session option as described in the section, ["Planning and Execution Options"]({{site.baseurl}}/docs/planning-and-execution-options). | Name | Default | Comments | |------------------------------------------------|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -33,8 +33,8 @@ The sys.options table lists the following options that you can set as a system o | exec.storage.enable_new_text_reader | TRUE | Enables the text reader that complies with the RFC 4180 standard for text/csv files. | | new_view_default_permissions | 700 | Sets view permissions using an octal code in the Unix tradition. | | planner.add_producer_consumer | FALSE | Increase prefetching of data from disk. Disable for in-memory reads. | -| planner.affinity_factor | 1.2 | Factor by which a node with endpoint affinity is favored while creating assignment. Accepts inputs of type DOUBLE. | -| planner.broadcast_factor | 1 | | +| planner.affinity_factor | 1.2 | Factor by which a node with endpoint affinity is favored while creating assignment. Accepts inputs of type DOUBLE. | +| planner.broadcast_factor | 1 | A heuristic parameter for influencing the broadcast of records as part of a query. | | planner.broadcast_threshold | 10000000 | The maximum number of records allowed to be broadcast as part of a query. After one million records, Drill reshuffles data rather than doing a broadcast to one side of the join. Range: 0-2147483647 | | planner.disable_exchanges | FALSE | Toggles the state of hashing to a random exchange. | | planner.enable_broadcast_join | TRUE | Changes the state of aggregation and join operators. The broadcast join can be used for hash join, merge join and nested loop join. Use to join a large (fact) table to relatively smaller (dimension) tables. Do not disable. | @@ -55,17 +55,17 @@ The sys.options table lists the following options that you can set as a system o | planner.identifier_max_length | 1024 | A minimum length is needed because option names are identifiers themselves. | | planner.join.hash_join_swap_margin_factor | 10 | The number of join order sequences to consider during the planning phase. | | planner.join.row_count_estimate_factor | 1 | The factor for adjusting the estimated row count when considering multiple join order sequences during the planning phase. | -| planner.memory.average_field_width | 8 | | +| planner.memory.average_field_width | 8 | Used in estimating memory requirements. | | planner.memory.enable_memory_estimation | FALSE | Toggles the state of memory estimation and re-planning of the query. When enabled, Drill conservatively estimates memory requirements and typically excludes these operators from the plan and negatively impacts performance. | -| planner.memory.hash_agg_table_factor | 1.1 | | -| planner.memory.hash_join_table_factor | 1.1 | | +| planner.memory.hash_agg_table_factor | 1.1 | A heuristic value for influencing the size of the hash aggregation table. | +| planner.memory.hash_join_table_factor | 1.1 | A heuristic value for influencing the size of the hash aggregation table. | | planner.memory.max_query_memory_per_node | 2147483648 bytes | Sets the maximum estimate of memory for a query per node in bytes. If the estimate is too low, Drill re-plans the query without memory-constrained operators. | -| planner.memory.non_blocking_operators_memory | 64 | Extra query memory per node foer non-blocking operators. This option is currently used only for memory estimation. Range: 0-2048 MB | -| planner.nestedloopjoin_factor | 100 | | +| planner.memory.non_blocking_operators_memory | 64 | Extra query memory per node for non-blocking operators. This option is currently used only for memory estimation. Range: 0-2048 MB | +| planner.nestedloopjoin_factor | 100 | A heuristic value for influencing the nested loop join. | | planner.partitioner_sender_max_threads | 8 | Upper limit of threads for outbound queuing. | -| planner.partitioner_sender_set_threads | -1 | | -| planner.partitioner_sender_threads_factor | 2 | | -| planner.producer_consumer_queue_size | 10 | How much data to prefetch from disk (in record batches) out of band of query execution | +| planner.partitioner_sender_set_threads | -1 | Overwrites the number of threads used to send out batches of records. Set to -1 to disable. Typically not changed. | +| planner.partitioner_sender_threads_factor | 2 | A heuristic param to use to influence final number of threads. The higher the value the fewer the number of threads. | +| planner.producer_consumer_queue_size | 10 | How much data to prefetch from disk in record batches out-of-band of query execution. | | planner.slice_target | 100000 | The number of records manipulated within a fragment before Drill parallelizes operations. | | planner.width.max_per_node | 3 | Maximum number of threads that can run in parallel for a query on a node. A slice is an individual thread. This number indicates the maximum number of slices per query for the queryâs major fragment on a node. | | planner.width.max_per_query | 1000 | Same as max per node but applies to the query as executed by the entire cluster. For example, this value might be the number of active Drillbits, or a higher number to return results faster. | @@ -77,7 +77,7 @@ The sys.options table lists the following options that you can set as a system o | store.mongo.read_numbers_as_double | FALSE | Similar to store.json.read_numbers_as_double. | | store.parquet.block-size | 536870912 | Sets the size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system. | | store.parquet.compression | snappy | Compression type for storing Parquet output. Allowed values: snappy, gzip, none | -| store.parquet.enable_dictionary_encoding | FALSE | Do not change. | +| store.parquet.enable_dictionary_encoding | FALSE | For internal use. Do not change. | | store.parquet.use_new_reader | FALSE | Not supported in this release. | | store.text.estimated_row_size_bytes | 100 | Estimate of the row size in a delimited text file, such as csv. The closer to actual, the better the query plan. Used for all csv files in the system/session where the value is set. Impacts the decision to plan a broadcast join or not. | | window.enable | FALSE | Not supported in this release. Coming soon. | http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/connect-a-data-source/080-drill-default-input-format.md ---------------------------------------------------------------------- diff --git a/_docs/connect-a-data-source/080-drill-default-input-format.md b/_docs/connect-a-data-source/080-drill-default-input-format.md index e817343..25a065b 100644 --- a/_docs/connect-a-data-source/080-drill-default-input-format.md +++ b/_docs/connect-a-data-source/080-drill-default-input-format.md @@ -61,7 +61,7 @@ steps: ## Querying Compressed JSON -You can use Drill 0.8 and later to query compressed JSON in .gz files as well as uncompressed files having the .json extension. First, add the gz extension to a storage plugin, and then use that plugin to query the compressed file. +You can query compressed JSON in .gz files as well as uncompressed files having the .json extension. First, add the gz extension to a storage plugin, and then use that plugin to query the compressed file. "extensions": [ "json", http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/data-sources-and-file-formats/030-deploying-and-using-a-hive-udf.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources-and-file-formats/030-deploying-and-using-a-hive-udf.md b/_docs/data-sources-and-file-formats/030-deploying-and-using-a-hive-udf.md index 538c1e3..6a26376 100644 --- a/_docs/data-sources-and-file-formats/030-deploying-and-using-a-hive-udf.md +++ b/_docs/data-sources-and-file-formats/030-deploying-and-using-a-hive-udf.md @@ -22,7 +22,7 @@ After you export the custom UDF as a JAR, perform the UDF setup tasks so Drill c To set up the UDF: 1. Register Hive. [Register a Hive storage plugin]({{ site.baseurl }}/docs/registering-hive/) that connects Drill to a Hive data source. -2. In Drill 0.7 and later, add the JAR for the UDF to the Drill CLASSPATH. In earlier versions of Drill, place the JAR file in the `/jars/3rdparty` directory of the Drill installation on all nodes running a Drillbit. +2. Add the JAR for the UDF to the Drill CLASSPATH. In earlier versions of Drill, place the JAR file in the `/jars/3rdparty` directory of the Drill installation on all nodes running a Drillbit. 3. On each Drill node in the cluster, restart the Drillbit. `<drill installation directory>/bin/drillbit.sh restart` http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/data-sources-and-file-formats/050-json-data-model.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources-and-file-formats/050-json-data-model.md b/_docs/data-sources-and-file-formats/050-json-data-model.md index 29efeb2..28ab921 100644 --- a/_docs/data-sources-and-file-formats/050-json-data-model.md +++ b/_docs/data-sources-and-file-formats/050-json-data-model.md @@ -12,7 +12,7 @@ Semi-structured JSON data often consists of complex, nested elements having sche Using Drill you can natively query dynamic JSON data sets using SQL. Drill treats a JSON object as a SQL record. One object equals one row in a Drill table. -Drill 0.8 and higher can [query compressed .gz files]({{ site.baseurl }}/docs/drill-default-input-format#querying-compressed-json) having JSON as well as uncompressed .json files. +You can also [query compressed .gz files]({{ site.baseurl }}/docs/drill-default-input-format#querying-compressed-json) having JSON as well as uncompressed .json files. In addition to the examples presented later in this section, see ["How to Analyze Highly Dynamic Datasets with Apache Drill"](https://www.mapr.com/blog/how-analyze-highly-dynamic-datasets-apache-drill) for information about how to analyze a JSON data set. @@ -56,7 +56,7 @@ When you set this option, Drill reads all numbers from the JSON files as DOUBLE. * Cast JSON values to [SQL types]({{ site.baseurl }}/docs/data-types), such as BIGINT, FLOAT, and INTEGER. * Cast JSON strings to [Drill Date/Time Data Type Formats]({{ site.baseurl }}/docs/supported-date-time-data-type-formats). -Drill uses [map and array data types]({{ site.baseurl }}/docs/data-types) internally for reading complex and nested data structures from JSON. You can cast data in a map or array of data to return a value from the structure, as shown in [âCreate a view on a MapR-DB tableâ] ({{ site.baseurl }}/docs/lesson-2-run-queries-with-ansi-sql). âQuery Complex Dataâ shows how to access nested arrays. +Drill uses [map and array data types]({{ site.baseurl }}/docs/data-types) internally for reading complex and nested data structures from JSON. You can cast data in a map or array of data to return a value from the structure, as shown in [âCreate a view on a MapR-DB tableâ] ({{ site.baseurl }}/docs/lesson-2-run-queries-with-ansi-sql). [âQuery Complex Dataâ]({{ site.baseurl }}/docs/querying-complex-data-introduction) shows how to access nested arrays. ## Reading JSON To read JSON data using Drill, use a [file system storage plugin]({{ site.baseurl }}/docs/connect-to-a-data-source) that defines the JSON format. You can use the `dfs` storage plugin, which includes the definition. http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/odbc-jdbc-interfaces/010-interfaces-introduction.md ---------------------------------------------------------------------- diff --git a/_docs/odbc-jdbc-interfaces/010-interfaces-introduction.md b/_docs/odbc-jdbc-interfaces/010-interfaces-introduction.md index fd4346e..d7bba62 100644 --- a/_docs/odbc-jdbc-interfaces/010-interfaces-introduction.md +++ b/_docs/odbc-jdbc-interfaces/010-interfaces-introduction.md @@ -18,8 +18,8 @@ MapR provides ODBC drivers for Windows, Mac OS X, and Linux. It is recommended that you install the latest version of Apache Drill with the latest version of the Drill ODBC driver. -For example, if you have Apache Drill 0.5 and a Drill ODBC driver installed on -your machine, and then you upgrade to Apache Drill 0.6, do not assume that the +For example, if you have Apache Drill 0.8 and a Drill ODBC driver installed on +your machine, and then you upgrade to Apache Drill 1.0, do not assume that the Drill ODBC driver installed on your machine will work with the new version of Apache Drill. Install the latest available Drill ODBC driver to ensure that the two components work together. http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/odbc-jdbc-interfaces/050-using-microstrategy-analytics-with-apache-drill.md ---------------------------------------------------------------------- diff --git a/_docs/odbc-jdbc-interfaces/050-using-microstrategy-analytics-with-apache-drill.md b/_docs/odbc-jdbc-interfaces/050-using-microstrategy-analytics-with-apache-drill.md index d6140aa..cdade1c 100755 --- a/_docs/odbc-jdbc-interfaces/050-using-microstrategy-analytics-with-apache-drill.md +++ b/_docs/odbc-jdbc-interfaces/050-using-microstrategy-analytics-with-apache-drill.md @@ -142,12 +142,11 @@ In this scenario, you learned how to configure MicroStrategy Analytics Enterpris ### Certification Links -MicroStrategy announced post certification of Drill 0.6 and 0.7 with MicroStrategy Analytics Enterprise 9.4.1 +* MicroStrategy certifies its analytics platform with Apache Drill: http://ir.microstrategy.com/releasedetail.cfm?releaseid=902795 +* http://community.microstrategy.com/t5/Database/TN225724-Post-Certification-of-MapR-Drill-0-6-and-0-7-with/ta-p/225724 -http://community.microstrategy.com/t5/Database/TN225724-Post-Certification-of-MapR-Drill-0-6-and-0-7-with/ta-p/225724 +* http://community.microstrategy.com/t5/Release-Notes/TN231092-Certified-Database-and-ODBC-configurations-for/ta-p/231092 -http://community.microstrategy.com/t5/Release-Notes/TN231092-Certified-Database-and-ODBC-configurations-for/ta-p/231092 - -http://community.microstrategy.com/t5/Release-Notes/TN231094-Certified-Database-and-ODBC-configurations-for/ta-p/231094 +* http://community.microstrategy.com/t5/Release-Notes/TN231094-Certified-Database-and-ODBC-configurations-for/ta-p/231094 http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/050-using-drill-explorer-on-windows.md ---------------------------------------------------------------------- diff --git a/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/050-using-drill-explorer-on-windows.md b/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/050-using-drill-explorer-on-windows.md index be0389d..3d84978 100644 --- a/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/050-using-drill-explorer-on-windows.md +++ b/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/050-using-drill-explorer-on-windows.md @@ -39,9 +39,9 @@ Preview again. 9. Click **Create As**. The _Create As_ dialog displays. 10. In the **Schema** field, select the schema where you want to save the view. - As of 0.4.0, you can only save views to file-based schemas. + You can save views only to file-based schemas. 11. In the **View Name** field, enter a descriptive name for the view. - As of 0.4.0, do not include spaces in the view name. + Do not include spaces in the view name. 12. Click **Save**. The status and any error message associated with the view creation displays in the Create As dialog. When a view saves successfully, the Save button changes http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md ---------------------------------------------------------------------- diff --git a/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md b/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md index 37f0d37..c822ada 100644 --- a/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md +++ b/_docs/tutorials/030-analyzing-the-yelp-academic-dataset.md @@ -33,7 +33,7 @@ want to scale your environment. ### Step 2 : Open the Drill tar file - tar -xvf apache-drill-0.6.0-incubating.tar + tar -xvf apache-drill-0.1.0.tar.gz ### Step 3: Launch SQLLine, a JDBC application that ships with Drill @@ -352,8 +352,7 @@ exploring data in ways we have never seen before with SQL technologies. The community is working on more exciting features around nested data and supporting data with changing schemas in upcoming releases. -As an example, a new FLATTEN function is in development (an upcoming feature -in 0.7). This function can be used to dynamically rationalize semi-structured +The FLATTEN function can be used to dynamically rationalize semi-structured data so you can apply even deeper SQL functionality. Here is a sample query: #### Get a flattened list of categories for each business http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/tutorials/040-learn-drill-with-the-mapr-sandbox.md ---------------------------------------------------------------------- diff --git a/_docs/tutorials/040-learn-drill-with-the-mapr-sandbox.md b/_docs/tutorials/040-learn-drill-with-the-mapr-sandbox.md index ab12d86..e64dc6f 100644 --- a/_docs/tutorials/040-learn-drill-with-the-mapr-sandbox.md +++ b/_docs/tutorials/040-learn-drill-with-the-mapr-sandbox.md @@ -15,23 +15,6 @@ the following pages in order: * [Lesson 3: Run Queries on Complex Data Types]({{ site.baseurl }}/docs/lesson-3-run-queries-on-complex-data-types) * [Summary]({{ site.baseurl }}/docs/summary) -## About Apache Drill - -Drill is an Apache open-source SQL query engine for Big Data exploration. -Drill is designed from the ground up to support high-performance analysis on -the semi-structured and rapidly evolving data coming from modern Big Data -applications, while still providing the familiarity and ecosystem of ANSI SQL, -the industry-standard query language. Drill provides plug-and-play integration -with existing Apache Hive and Apache HBase deployments.Apache Drill 0.5 offers -the following key features: - - * Low-latency SQL queries - * Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore. - * ANSI SQL - * Nested data support - * Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs) - * BI/SQL tool integration using standard JDBC/ODBC drivers - ## MapR Sandbox with Apache Drill MapR includes Apache Drill as part of the Hadoop distribution. The MapR @@ -45,7 +28,7 @@ refer to the [Apache Drill web site](http://drill.apache.org) and ]({{ site.baseurl }}/docs)for more details. -Note that Hadoop is not a prerequisite for Drill and users can start ramping +Hadoop is not a prerequisite for Drill and users can start ramping up with Drill by running SQL queries directly on the local file system. Refer to [Apache Drill in 10 minutes]({{ site.baseurl }}/docs/drill-in-10-minutes) for an introduction to using Drill in local (embedded) mode. http://git-wip-us.apache.org/repos/asf/drill/blob/c38e6a18/_docs/tutorials/learn-drill-with-the-mapr-sandbox/010-installing-the-apache-drill-sandbox.md ---------------------------------------------------------------------- diff --git a/_docs/tutorials/learn-drill-with-the-mapr-sandbox/010-installing-the-apache-drill-sandbox.md b/_docs/tutorials/learn-drill-with-the-mapr-sandbox/010-installing-the-apache-drill-sandbox.md index e5ea95b..0081fc5 100755 --- a/_docs/tutorials/learn-drill-with-the-mapr-sandbox/010-installing-the-apache-drill-sandbox.md +++ b/_docs/tutorials/learn-drill-with-the-mapr-sandbox/010-installing-the-apache-drill-sandbox.md @@ -113,7 +113,7 @@ VirtualBox adapter. 9. Click Settings.  - The MapR-Sandbox-For-Apache-Drill-0.6.0-r2-4.0.1 - Settings dialog appears. + The MapR-Sandbox-For-Apache-Drill - Settings dialog appears.  10. Click **OK** to continue.
