Repository: drill Updated Branches: refs/heads/gh-pages ae07d7f8f -> 1fbd74fe2
http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/090-configuring-jreport-with-drill.md ---------------------------------------------------------------------- diff --git a/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/090-configuring-jreport-with-drill.md b/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/090-configuring-jreport-with-drill.md index 668927f..151d86b 100644 --- a/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/090-configuring-jreport-with-drill.md +++ b/_docs/odbc-jdbc-interfaces/using-drill-with-bi-tools/090-configuring-jreport-with-drill.md @@ -1,6 +1,6 @@ --- title: "Configuring JReport with Drill" -date: +date: 2018-02-09 00:16:03 UTC parent: "Using Drill with BI Tools" --- @@ -12,9 +12,8 @@ You can use JReport 13.1 and the Apache Drill JDBC Driver to easily extract data 2. Create a new JReport Catalog to manage the Drill connection. 3. Use JReport Designer to query the data and create a report. ----------- -### Step 1: Install the Drill JDBC Driver with JReport +## Step 1: Install the Drill JDBC Driver with JReport Drill provides standard JDBC connectivity to integrate with JReport. JReport 13.1 requires Drill 1.0 or later. For general instructions on installing the Drill JDBC driver, see [Using JDBC]({{ site.baseurl }}/docs/using-the-jdbc-driver/). @@ -33,9 +32,8 @@ For general instructions on installing the Drill JDBC driver, see [Using JDBC]({ 4. Verify that the JReport system can resolve the hostnames of the ZooKeeper nodes of the Drill cluster. You can do this by configuring DNS for all of the systems. Alternatively, you can edit the hosts file on the JReport system to include the hostnames and IP addresses of all the ZooKeeper nodes used with the Drill cluster. For Linux systems, the hosts file is located at `/etc/hosts`. For Windows systems, the hosts file is located at `%WINDIR%\system32\drivers\etc\hosts` Here is an example of a Windows hosts file:  ----------- -### Step 2: Create a New JReport Catalog to Manage the Drill Connection +## Step 2: Create a New JReport Catalog to Manage the Drill Connection 1. Click Create **New -> Catalogâ¦** 2. Provide a catalog file name and click **â¦** to choose the file-saving location. @@ -49,7 +47,7 @@ For general instructions on installing the Drill JDBC driver, see [Using JDBC]({ 10. Click **Done** when you have added all the tables you need. -### Step 3: Use JReport Designer +## Step 3: Use JReport Designer 1. In the Catalog Browser, right-click **Queries** and select **Add Queryâ¦** 2. Define a JReport query by using the Query Editor. You can also import your own SQL statements.  http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/query-data/query-a-file-system/050-querying-sequence-files.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/query-a-file-system/050-querying-sequence-files.md b/_docs/query-data/query-a-file-system/050-querying-sequence-files.md index f9b916f..941bcfc 100644 --- a/_docs/query-data/query-a-file-system/050-querying-sequence-files.md +++ b/_docs/query-data/query-a-file-system/050-querying-sequence-files.md @@ -1,32 +1,27 @@ --- title: "Querying Sequence Files" -date: 2016-11-21 22:14:46 UTC +date: 2018-02-09 00:16:04 UTC parent: "Querying a File System" --- -Sequence files are flat files storing binary key value pairs. -Drill projects sequence files as table with two columns 'binary_key', 'binary_value'. +Sequence files are flat files that store binary key value pairs. +Drill projects sequence files as a table with two columns 'binary_key', 'binary_value'. -### Querying sequence file. +## Querying a Sequence File -Start drill shell +Start the Drill shell and enter your query. - SELECT * - FROM dfs.tmp.`simple.seq` - LIMIT 1; + SELECT * FROM dfs.tmp.`simple.seq` LIMIT 1; +--------------+---------------+ | binary_key | binary_value | +--------------+---------------+ | [B@70828f46 | [B@b8c765f | +--------------+---------------+ -Since simple.seq contains byte serialized strings as keys and values, we can convert them to strings. +Since simple.seq contains byte serialized strings as keys and values, you can convert them to strings. - SELECT CONVERT_FROM(binary_key, 'UTF8'), CONVERT_FROM(binary_value, 'UTF8') - FROM dfs.tmp.`simple.seq` - LIMIT 1 - ; + SELECT CONVERT_FROM(binary_key, 'UTF8'), CONVERT_FROM(binary_value, 'UTF8') FROM dfs.tmp.`simple.seq` LIMIT 1; +-----------+-------------+ | EXPR$0 | EXPR$1 | +-----------+-------------+ http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md b/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md index 1f5c859..afd2786 100644 --- a/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md +++ b/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md @@ -1,14 +1,15 @@ --- title: "Querying Complex Data Introduction" -date: +date: 2018-02-09 00:16:04 UTC parent: "Querying Complex Data" --- Apache Drill queries do not require prior knowledge of the actual data you are trying to access, regardless of its source system or its schema and data types. The sweet spot for Apache Drill is a SQL query workload against *complex data*: data made up of various types of records and fields, rather -than data in a recognizable relational form (discrete rows and columns). Drill -is capable of discovering the form of the data when you submit the query. +than data in a recognizable relational form (discrete rows and columns). + +Drill is capable of discovering the form of the data when you submit the query. Nested data formats such as JSON (JavaScript Object Notation) files and Parquet files are not only _accessible_: Drill provides special operators and functions that you can use to _drill down_ into these files and ask @@ -38,7 +39,7 @@ examples show how to use the Drill extensions in the context of standard SQL SELECT statements. For the most part, the extensions use standard JavaScript notation for referencing data elements in a hierarchy. -### Before You Begin +## Before You Begin The examples in this section operate on JSON data files. In order to write your own queries, you need to be aware of the basic data types in these files: http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/rn/073-alpha-rn.md ---------------------------------------------------------------------- diff --git a/_docs/rn/073-alpha-rn.md b/_docs/rn/073-alpha-rn.md index 256300a..0121642 100644 --- a/_docs/rn/073-alpha-rn.md +++ b/_docs/rn/073-alpha-rn.md @@ -2,7 +2,7 @@ title: "Apache Drill M1 Release Notes (Apache Drill Alpha)" parent: "Release Notes" --- -### Milestone 1 Goals +## Milestone 1 Goals The first release of Apache Drill is designed as a technology preview for people to better understand the architecture and vision. It is a functional @@ -21,7 +21,7 @@ architectural analysis and performance optimization. * Provided an extensible query-language agnostic JSON-base logical data flow syntax. * Support complex data type manipulation via logical plan operations -### Known Issues +## Known Issues SQL Parsing Because Apache Drill is built to support late-bound changing schemas while SQL http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/rn/074-m1-alpha-rn.md ---------------------------------------------------------------------- diff --git a/_docs/rn/074-m1-alpha-rn.md b/_docs/rn/074-m1-alpha-rn.md index 256300a..0121642 100644 --- a/_docs/rn/074-m1-alpha-rn.md +++ b/_docs/rn/074-m1-alpha-rn.md @@ -2,7 +2,7 @@ title: "Apache Drill M1 Release Notes (Apache Drill Alpha)" parent: "Release Notes" --- -### Milestone 1 Goals +## Milestone 1 Goals The first release of Apache Drill is designed as a technology preview for people to better understand the architecture and vision. It is a functional @@ -21,7 +21,7 @@ architectural analysis and performance optimization. * Provided an extensible query-language agnostic JSON-base logical data flow syntax. * Support complex data type manipulation via logical plan operations -### Known Issues +## Known Issues SQL Parsing Because Apache Drill is built to support late-bound changing schemas while SQL http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/sql-reference/030-lexical-structure.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/030-lexical-structure.md b/_docs/sql-reference/030-lexical-structure.md index 7ca96fd..eca8ca0 100644 --- a/_docs/sql-reference/030-lexical-structure.md +++ b/_docs/sql-reference/030-lexical-structure.md @@ -1,6 +1,6 @@ --- title: "Lexical Structure" -date: 2017-11-14 15:31:15 UTC +date: 2018-02-09 00:16:05 UTC parent: "SQL Reference" --- @@ -72,7 +72,7 @@ Format dates using dashes (-) to separate year, month, and day. Format time usin * Timestamp: 2008-12-15 22:55:55.12345 -If you have dates and times in other formats, use a [data type conversion function](/data-type-conversion/#other-data-type-conversions) in your queries. +If you have dates and times in other formats, use a [data type conversion function]({{site.baseurl}}/docs/data-type-conversion/) in your queries. ### Identifiers An identifier is a letter followed by any sequence of letters, digits, or the underscore. For example, names of tables, columns, and aliases are identifiers. Maximum length is 1024 characters. Enclose the following identifiers with identifier quotes: http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/sql-reference/sql-commands/005-supported-sql-commands.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/005-supported-sql-commands.md b/_docs/sql-reference/sql-commands/005-supported-sql-commands.md index 2d838d9..9d242e9 100644 --- a/_docs/sql-reference/sql-commands/005-supported-sql-commands.md +++ b/_docs/sql-reference/sql-commands/005-supported-sql-commands.md @@ -1,6 +1,6 @@ --- title: "Supported SQL Commands" -date: 2018-02-08 00:38:57 UTC +date: 2018-02-09 00:16:05 UTC parent: "SQL Commands" --- The following table provides a list of the SQL commands that Drill supports, @@ -26,4 +26,4 @@ with their descriptions and example syntax: | [SHOW FILES]({{site.baseurl}}/docs/show-files) | Returns a list of files in a file system schema. | SHOW FILES IN|FROM filesystem.\`schema_name`; | | [SHOW SCHEMAS]({{site.baseurl}}/docs/show-databases-and-show-schemas) | Returns a list of available schemas. Equivalent to SHOW DATABASES. | SHOW SCHEMAS; | | [SHOW TABLES]({{site.baseurl}}/docs/show-tables) | Returns a list of tables and views. | SHOW TABLES; | -| [USE]({{site.baseurl}}/docs/use) | Change to a particular schema. When you opt to use a particular schema, Drill issues queries on that schema only. | USE schema_name; | | +| [USE]({{site.baseurl}}/docs/use) | Change to a particular schema. When you opt to use a particular schema, Drill issues queries on that schema only. | USE schema_name; | http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/sql-reference/sql-commands/070-explain.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/070-explain.md b/_docs/sql-reference/sql-commands/070-explain.md index 871643c..508db20 100644 --- a/_docs/sql-reference/sql-commands/070-explain.md +++ b/_docs/sql-reference/sql-commands/070-explain.md @@ -1,6 +1,6 @@ --- title: "EXPLAIN" -date: +date: 2018-02-09 00:16:06 UTC parent: "SQL Commands" --- EXPLAIN is a useful tool for examining the steps that a query goes through @@ -35,17 +35,17 @@ The EXPLAIN command supports the following syntax: where `query` is any valid SELECT statement supported by Drill. -##### INCLUDING ALL ATTRIBUTES +**INCLUDING ALL ATTRIBUTES** This option returns costing information. You can use this option for both -physical and logical plans. +physical and logical plans. -#### WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION +**WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION** These options return the physical and logical plan information, respectively. The default is physical (WITH IMPLEMENTATION). -## EXPLAIN for Physical Plans +### EXPLAIN for Physical Plans The EXPLAIN PLAN FOR <query> command returns the chosen physical execution plan for a query statement without running the query. You can use this command @@ -106,7 +106,7 @@ for submitting the query via Drill APIs. }, .... -## Costing Information +**Costing Information** Add the INCLUDING ALL ATTRIBUTES option to the EXPLAIN command to see cost estimates for the query plan. For example: @@ -124,7 +124,7 @@ estimates for the query plan. For example: 00-05 Project(T1¦¦*=[$0], type=[$1]): rowcount = 1.0, cumulative cost = {1.0 rows, 8.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 884 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/home/donuts/donuts.json, numFiles=1, columns=[`*`], files=[file:/home/donuts/donuts.json]]]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 883 -## EXPLAIN for Logical Plans +### EXPLAIN for Logical Plans To return the logical plan for a query (again, without actually running the query), use the EXPLAIN PLAN WITHOUT IMPLEMENTATION syntax: http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/_docs/sql-reference/sql-commands/110-show-tables.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/110-show-tables.md b/_docs/sql-reference/sql-commands/110-show-tables.md index 54169eb..4ef0fbf 100644 --- a/_docs/sql-reference/sql-commands/110-show-tables.md +++ b/_docs/sql-reference/sql-commands/110-show-tables.md @@ -1,6 +1,6 @@ --- title: "SHOW TABLES" -date: +date: 2018-02-09 00:16:06 UTC parent: "SQL Commands" --- The SHOW TABLES command returns a list of views created within a schema. It @@ -27,7 +27,7 @@ In this example, â`myviews`â is a workspace created within the When you use a particular schema and then issue the SHOW TABLES command, Drill returns the tables and views within that schema. -#### Limitations +## Limitations * You can create and query tables within the file system, however Drill does not return these tables when you issue the SHOW TABLES command. You can issue the [SHOW FILES ]({{ site.baseurl }}/docs/show-files-command)command to see a list of all files, tables, and views, including those created in Drill. http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/blog/_posts/2014-12-09-running-sql-queries-on-amazon-s3.md ---------------------------------------------------------------------- diff --git a/blog/_posts/2014-12-09-running-sql-queries-on-amazon-s3.md b/blog/_posts/2014-12-09-running-sql-queries-on-amazon-s3.md index c58b4c2..a8a5781 100644 --- a/blog/_posts/2014-12-09-running-sql-queries-on-amazon-s3.md +++ b/blog/_posts/2014-12-09-running-sql-queries-on-amazon-s3.md @@ -3,7 +3,7 @@ layout: post title: "Running SQL Queries on Amazon S3" code: running-sql-queries-on-amazon-s3 excerpt: Drill enables you to run SQL queries directly on data in S3. There's no need to ingest the data into a managed cluster or transform the data. This is a step-by-step tutorial on how to use Drill with S3. -date: 2014-12-9 18:50:01 +date: 2018-02-09 00:16:07 UTC authors: ["namato"] --- The functionality and sheer usefulness of Drill is growing fast. If you're a user of some of the popular BI tools out there like Tableau or SAP Lumira, now is a good time to take a look at how Drill can make your life easier, especially if you're faced with the task of quickly getting a handle on large sets of unstructured data. With schema generated on the fly, you can save a lot of time and headaches by running SQL queries on the data where it rests without knowing much about columns or formats. There's even more good news: Drill also works with data stored in the cloud. With a few simple steps, you can configure the S3 storage plugin for Drill and be off to the races running queries. In this post we'll look at how to configure Drill to access data stored in an S3 bucket. @@ -19,11 +19,11 @@ At a high level, configuring Drill to access S3 bucket data is accomplished with Consult the [Architectural Overview](https://cwiki.apache.org/confluence/display/DRILL/Architectural+Overview) for a refresher on the architecture of Drill. -### Prerequisites +## Prerequisites These steps assume you have a [typical Drill cluster and ZooKeeper quorum](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes) configured and running. To access data in S3, you will need an S3 bucket configured and have the required Amazon security credentials in your possession. An [Amazon blog post](http://blogs.aws.amazon.com/security/post/Tx1R9KDN9ISZ0HF/Where-s-my-secret-access-key) has more information on how to get these from your account. -### Configuration Steps +## Configuration Steps To connect Drill to S3, all of the drillbit nodes will need to access code in the JetS3t library developed by Amazon. As of this writing, 0.9.2 is the latest version but you might want to check [the main page](https://jets3t.s3.amazonaws.com/toolkit/toolkit.html) to see if anything has been updated. Be sure to get version 0.9.2 or later as earlier versions have a bug relating to reading Parquet data. http://git-wip-us.apache.org/repos/asf/drill/blob/1fbd74fe/blog/_posts/2014-12-11-apache-drill-qa-panelist-spotlight.md ---------------------------------------------------------------------- diff --git a/blog/_posts/2014-12-11-apache-drill-qa-panelist-spotlight.md b/blog/_posts/2014-12-11-apache-drill-qa-panelist-spotlight.md index f1e95d5..b8e4a6d 100644 --- a/blog/_posts/2014-12-11-apache-drill-qa-panelist-spotlight.md +++ b/blog/_posts/2014-12-11-apache-drill-qa-panelist-spotlight.md @@ -31,17 +31,17 @@ Want to learn how to leverage Apache Drill in order to get better analytical ins Apache Drill committers Tomer Shiran, Jacques Nadeau, and Ted Dunning, as well as Tableau Product Manager Jeff Feng and Data Scientist Dr. Kirk Borne will be on hand to answer your questions. -#### Tomer Shiran, Apache Drill Founder (@tshiran) +## Tomer Shiran, Apache Drill Founder (@tshiran) Tomer Shiran is the founder of Apache Drill, and a PMC member and committer on the project. He is VP Product Management at MapR, responsible for product strategy, roadmap and new feature development. Prior to MapR, Tomer held numerous product management and engineering roles at Microsoft, most recently as the product manager for Microsoft Internet Security & Acceleration Server (now Microsoft Forefront). He is the founder of two websites that have served tens of millions of users, and received coverage in prestigious publications such as The New York Times, USA Today and The Times of London. Tomer is also the author of a 900-page programming book. He holds an MS in Computer Engineering from Carnegie Mellon University and a BS in Computer Science from Technion - Israel Institute of Technology. -#### Jeff Feng, Product Manager Tableau Software (@jtfeng) +## Jeff Feng, Product Manager Tableau Software (@jtfeng) Jeff Feng is a Product Manager at Tableau and leads their Big Data product roadmap & strategic vision. In his role, he focuses on joint technology integration and partnership efforts with a number of Hadoop, NoSQL and web application partners in helping users see and understand their data. -#### Ted Dunning, Apache Drill Comitter (@Ted_Dunning) +## Ted Dunning, Apache Drill Comitter (@Ted_Dunning) Ted Dunning is Chief Applications Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects and mentor for Apache Storm. He contributed to Mahout clustering, classification and matrix decomposition algorithms and helped expand the new version of Mahout Math library. Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems, he built fraud detection systems for ID Analytics (LifeLock) and he has issued 24 patents to date. Ted has a PhD in computing science from University of Sheffield. When heâs not doing data science, he plays guitar and mandolin. -#### Jacques Nadeau, Vice President, Apache Drill (@intjesus) +## Jacques Nadeau, Vice President, Apache Drill (@intjesus) Jacques Nadeau leads Apache Drill development efforts at MapR Technologies. He is an industry veteran with over 15 years of big data and analytics experience. Most recently, he was cofounder and CTO of search engine startup YapMap. Before that, he was director of new product engineering with Quigo (contextual advertising, acquired by AOL in 2007). He also built the Avenue A | Razorfish analytics data warehousing system and associated services practice (acquired by Microsoft). -#### Dr. Kirk Borne, George Mason University (@KirkDBorne) +## Dr. Kirk Borne, George Mason University (@KirkDBorne) Dr. Kirk Borne is a Transdisciplinary Data Scientist and an Astrophysicist. He is Professor of Astrophysics and Computational Science in the George Mason University School of Physics, Astronomy, and Computational Sciences. He has been at Mason since 2003, where he teaches and advises students in the graduate and undergraduate Computational Science, Informatics, and Data Science programs. Previously, he spent nearly 20 years in positions supporting NASA projects, including an assignment as NASA's Data Archive Project Scientist for the Hubble Space Telescope, and as Project Manager in NASA's Space Science Data Operations Office. He has extensive experience in big data and data science, including expertise in scientific data mining and data systems. He has published over 200 articles (research papers, conference papers, and book chapters), and given over 200 invited talks at conferences and universities worldwide. \ No newline at end of file
