update to yaml date format
Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/0020c696 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/0020c696 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/0020c696 Branch: refs/heads/gh-pages Commit: 0020c696dd97d6694920d0a30d8455a2e32058d9 Parents: 98851d3 Author: Bridget Bevens <bbev...@maprtech.com> Authored: Mon Nov 21 14:14:40 2016 -0800 Committer: Bridget Bevens <bbev...@maprtech.com> Committed: Mon Nov 21 14:14:40 2016 -0800 ---------------------------------------------------------------------- .../120-configuring-the-drill-shell.md | 2 +- _docs/developer-information/009-rest-api.md | 2 +- .../030-using-partition-pruning.md | 216 +++++++++---------- .../009-querying-avro-files.md | 2 +- .../010-querying-json-files.md | 2 +- .../020-querying-parquet-files.md | 2 +- .../030-querying-plain-text-files.md | 2 +- .../040-querying-directories.md | 2 +- .../050-querying-sequence-files.md | 2 +- 9 files changed, 116 insertions(+), 116 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/configure-drill/120-configuring-the-drill-shell.md ---------------------------------------------------------------------- diff --git a/_docs/configure-drill/120-configuring-the-drill-shell.md b/_docs/configure-drill/120-configuring-the-drill-shell.md index 5ab5060..6b7d3a8 100644 --- a/_docs/configure-drill/120-configuring-the-drill-shell.md +++ b/_docs/configure-drill/120-configuring-the-drill-shell.md @@ -1,6 +1,6 @@ --- title: "Configuring the Drill Shell" -date: +date: 2016-11-21 22:14:41 UTC parent: "Configure Drill" --- After [starting the Drill shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/), you can type queries on the shell command line. At the Drill shell command prompt, typing "help" lists the configuration and other options you can set to manage shell functionality. Apache Drill 1.0 and later formats the resultset output tables for readability if possible. In this release, columns having 70 characters or more cannot be formatted. This document formats all output for readability and example purposes. http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/developer-information/009-rest-api.md ---------------------------------------------------------------------- diff --git a/_docs/developer-information/009-rest-api.md b/_docs/developer-information/009-rest-api.md index 043c4d1..ce9e03e 100644 --- a/_docs/developer-information/009-rest-api.md +++ b/_docs/developer-information/009-rest-api.md @@ -1,6 +1,6 @@ --- title: "REST API" -date: +date: 2016-11-21 22:14:41 UTC parent: "Developer Information" --- http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md ---------------------------------------------------------------------- diff --git a/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md b/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md index 210335f..ce2f77b 100644 --- a/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md +++ b/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md @@ -1,108 +1,108 @@ ---- -title: "How to Partition Data" -date: -parent: "Partition Pruning" ---- - -In Drill 1.1.0 and later, if the data source is Parquet, no data organization tasks are required to take advantage of partition pruning. To partition and query Parquet files generated from other tools, use Drill to read and rewrite the files and metadata using the CTAS command with the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/) clause in the CTAS statement. - -The Parquet writer first sorts data by the partition keys, and then creates a new file when it encounters a new value for the partition columns. During partitioning, Drill creates separate files, but not separate directories, for different partitions. Each file contains exactly one partition value, but there can be multiple files for the same partition value. - -Partition pruning uses the Parquet column statistics to determine which columns to use to prune. - -Unlike using the Drill 1.0 partitioning, no view query is subsequently required, nor is it necessary to use the [dir* variables]({{site.baseurl}}/docs/querying-directories) after you use the PARTITION BY clause in a CTAS statement. - -## Drill 1.0 Partitioning - -Drill 1.0 does not support the PARTITION BY clause of the CTAS command supported by later versions. Partitioning Drill 1.0-generated data involves performing the following steps. - -1. Devise a logical way to store the data in a hierarchy of directories. -2. Use CTAS to create Parquet files from the original data, specifying filter conditions. -3. Move the files into directories in the hierarchy. - -After partitioning the data, you need to create a view of the partitioned data to query the data. You can use the [dir* variables]({{site.baseurl}}/docs/querying-directories) in queries to refer to subdirectories in your workspace path. - -### Drill 1.0 Partitioning Example - -Suppose you have text files containing several years of log data. To partition the data by year and quarter, create the following hierarchy of directories: - - â¦/logs/1994/Q1 - â¦/logs/1994/Q2 - â¦/logs/1994/Q3 - â¦/logs/1994/Q4 - â¦/logs/1995/Q1 - â¦/logs/1995/Q2 - â¦/logs/1995/Q3 - â¦/logs/1995/Q4 - â¦/logs/1996/Q1 - â¦/logs/1996/Q2 - â¦/logs/1996/Q3 - â¦/logs/1996/Q4 - -Run the following CTAS statement, filtering on the Q1 1994 data. - - CREATE TABLE TT_1994_Q1 - AS SELECT * FROM <raw table data in text format > - WHERE columns[1] = 1994 AND columns[2] = 'Q1' - -This creates a Parquet file with the log data for Q1 1994 in the current workspace. You can then move the file into the correlating directory, and repeat the process until all of the files are stored in their respective directories. - -Now you can define views on the parquet files and query the views. - - 0: jdbc:drill:zk=local> create view vv1 as select `dir0` as `year`, `dir1` as `qtr` from dfs.`/Users/max/data/multilevel/parquet`; - +------------+------------+ - | ok | summary | - +------------+------------+ - | true | View 'vv1' created successfully in 'dfs.tmp' schema | - +------------+------------+ - 1 row selected (0.16 seconds) - -Query the view to see all of the logs. - - 0: jdbc:drill:zk=local> select * from dfs.tmp.vv1; - +------------+------------+ - | year | qtr | - +------------+------------+ - | 1994 | Q1 | - | 1994 | Q3 | - | 1994 | Q3 | - | 1994 | Q4 | - | 1994 | Q4 | - | 1994 | Q4 | - | 1994 | Q4 | - | 1995 | Q2 | - | 1995 | Q2 | - | 1995 | Q2 | - | 1995 | Q2 | - | 1995 | Q4 | - | 1995 | Q4 | - | 1995 | Q4 | - | 1995 | Q4 | - | 1995 | Q4 | - | 1995 | Q4 | - | 1995 | Q4 | - | 1996 | Q1 | - | 1996 | Q1 | - | 1996 | Q1 | - | 1996 | Q1 | - | 1996 | Q1 | - | 1996 | Q2 | - | 1996 | Q3 | - | 1996 | Q3 | - | 1996 | Q3 | - +------------+------------+ - ... - - -When you query the view, Drill can apply partition pruning and read only the files and directories required to return query results. - - 0: jdbc:drill:zk=local> explain plan for select * from dfs.tmp.vv1 where `year` = 1996 and qtr = 'Q2'; - +------------+------------+ - | text | json | - +------------+------------+ - | 00-00 Screen - 00-01 Project(year=[$0], qtr=[$1]) - 00-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/maxdata/multilevel/parquet/1996/Q2/orders_96_q2.parquet]], selectionRoot=/Users/max/data/multilevel/parquet, numFiles=1, columns=[`dir0`, `dir1`]]]) - - - +--- +title: "How to Partition Data" +date: 2016-11-21 22:14:42 UTC +parent: "Partition Pruning" +--- + +In Drill 1.1.0 and later, if the data source is Parquet, no data organization tasks are required to take advantage of partition pruning. To partition and query Parquet files generated from other tools, use Drill to read and rewrite the files and metadata using the CTAS command with the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/) clause in the CTAS statement. + +The Parquet writer first sorts data by the partition keys, and then creates a new file when it encounters a new value for the partition columns. During partitioning, Drill creates separate files, but not separate directories, for different partitions. Each file contains exactly one partition value, but there can be multiple files for the same partition value. + +Partition pruning uses the Parquet column statistics to determine which columns to use to prune. + +Unlike using the Drill 1.0 partitioning, no view query is subsequently required, nor is it necessary to use the [dir* variables]({{site.baseurl}}/docs/querying-directories) after you use the PARTITION BY clause in a CTAS statement. + +## Drill 1.0 Partitioning + +Drill 1.0 does not support the PARTITION BY clause of the CTAS command supported by later versions. Partitioning Drill 1.0-generated data involves performing the following steps. + +1. Devise a logical way to store the data in a hierarchy of directories. +2. Use CTAS to create Parquet files from the original data, specifying filter conditions. +3. Move the files into directories in the hierarchy. + +After partitioning the data, you need to create a view of the partitioned data to query the data. You can use the [dir* variables]({{site.baseurl}}/docs/querying-directories) in queries to refer to subdirectories in your workspace path. + +### Drill 1.0 Partitioning Example + +Suppose you have text files containing several years of log data. To partition the data by year and quarter, create the following hierarchy of directories: + + â¦/logs/1994/Q1 + â¦/logs/1994/Q2 + â¦/logs/1994/Q3 + â¦/logs/1994/Q4 + â¦/logs/1995/Q1 + â¦/logs/1995/Q2 + â¦/logs/1995/Q3 + â¦/logs/1995/Q4 + â¦/logs/1996/Q1 + â¦/logs/1996/Q2 + â¦/logs/1996/Q3 + â¦/logs/1996/Q4 + +Run the following CTAS statement, filtering on the Q1 1994 data. + + CREATE TABLE TT_1994_Q1 + AS SELECT * FROM <raw table data in text format > + WHERE columns[1] = 1994 AND columns[2] = 'Q1' + +This creates a Parquet file with the log data for Q1 1994 in the current workspace. You can then move the file into the correlating directory, and repeat the process until all of the files are stored in their respective directories. + +Now you can define views on the parquet files and query the views. + + 0: jdbc:drill:zk=local> create view vv1 as select `dir0` as `year`, `dir1` as `qtr` from dfs.`/Users/max/data/multilevel/parquet`; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | View 'vv1' created successfully in 'dfs.tmp' schema | + +------------+------------+ + 1 row selected (0.16 seconds) + +Query the view to see all of the logs. + + 0: jdbc:drill:zk=local> select * from dfs.tmp.vv1; + +------------+------------+ + | year | qtr | + +------------+------------+ + | 1994 | Q1 | + | 1994 | Q3 | + | 1994 | Q3 | + | 1994 | Q4 | + | 1994 | Q4 | + | 1994 | Q4 | + | 1994 | Q4 | + | 1995 | Q2 | + | 1995 | Q2 | + | 1995 | Q2 | + | 1995 | Q2 | + | 1995 | Q4 | + | 1995 | Q4 | + | 1995 | Q4 | + | 1995 | Q4 | + | 1995 | Q4 | + | 1995 | Q4 | + | 1995 | Q4 | + | 1996 | Q1 | + | 1996 | Q1 | + | 1996 | Q1 | + | 1996 | Q1 | + | 1996 | Q1 | + | 1996 | Q2 | + | 1996 | Q3 | + | 1996 | Q3 | + | 1996 | Q3 | + +------------+------------+ + ... + + +When you query the view, Drill can apply partition pruning and read only the files and directories required to return query results. + + 0: jdbc:drill:zk=local> explain plan for select * from dfs.tmp.vv1 where `year` = 1996 and qtr = 'Q2'; + +------------+------------+ + | text | json | + +------------+------------+ + | 00-00 Screen + 00-01 Project(year=[$0], qtr=[$1]) + 00-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/maxdata/multilevel/parquet/1996/Q2/orders_96_q2.parquet]], selectionRoot=/Users/max/data/multilevel/parquet, numFiles=1, columns=[`dir0`, `dir1`]]]) + + + http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/query-data/query-a-file-system/009-querying-avro-files.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/query-a-file-system/009-querying-avro-files.md b/_docs/query-data/query-a-file-system/009-querying-avro-files.md index 7f25d22..308b990 100644 --- a/_docs/query-data/query-a-file-system/009-querying-avro-files.md +++ b/_docs/query-data/query-a-file-system/009-querying-avro-files.md @@ -1,6 +1,6 @@ --- title: "Querying Avro Files" -date: +date: 2016-11-21 22:14:43 UTC parent: "Querying a File System" --- http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/query-data/query-a-file-system/010-querying-json-files.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/query-a-file-system/010-querying-json-files.md b/_docs/query-data/query-a-file-system/010-querying-json-files.md index 1ddf5ef..4f3f855 100644 --- a/_docs/query-data/query-a-file-system/010-querying-json-files.md +++ b/_docs/query-data/query-a-file-system/010-querying-json-files.md @@ -1,6 +1,6 @@ --- title: "Querying JSON Files" -date: +date: 2016-11-21 22:14:43 UTC parent: "Querying a File System" --- To query complex JSON files, you need to understand the ["JSON Data Model"]({{site.baseurl}}/docs/json-data-model/). This section provides a trivial example of querying a sample file that Drill installs. http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/query-data/query-a-file-system/020-querying-parquet-files.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/query-a-file-system/020-querying-parquet-files.md b/_docs/query-data/query-a-file-system/020-querying-parquet-files.md index fbb1466..c63f3ad 100644 --- a/_docs/query-data/query-a-file-system/020-querying-parquet-files.md +++ b/_docs/query-data/query-a-file-system/020-querying-parquet-files.md @@ -1,6 +1,6 @@ --- title: "Querying Parquet Files" -date: +date: 2016-11-21 22:14:44 UTC parent: "Querying a File System" --- http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md index e451112..f258e6e 100644 --- a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md +++ b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md @@ -1,6 +1,6 @@ --- title: "Querying Plain Text Files" -date: +date: 2016-11-21 22:14:45 UTC parent: "Querying a File System" --- You can use Drill to access structured file types and plain text files http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/query-data/query-a-file-system/040-querying-directories.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/query-a-file-system/040-querying-directories.md b/_docs/query-data/query-a-file-system/040-querying-directories.md index f3e5ee4..2fbefee 100644 --- a/_docs/query-data/query-a-file-system/040-querying-directories.md +++ b/_docs/query-data/query-a-file-system/040-querying-directories.md @@ -1,6 +1,6 @@ --- title: "Querying Directories" -date: +date: 2016-11-21 22:14:46 UTC parent: "Querying a File System" --- You can store multiple files in a directory and query them as if they were a http://git-wip-us.apache.org/repos/asf/drill/blob/0020c696/_docs/query-data/query-a-file-system/050-querying-sequence-files.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/query-a-file-system/050-querying-sequence-files.md b/_docs/query-data/query-a-file-system/050-querying-sequence-files.md index b163621..f9b916f 100644 --- a/_docs/query-data/query-a-file-system/050-querying-sequence-files.md +++ b/_docs/query-data/query-a-file-system/050-querying-sequence-files.md @@ -1,6 +1,6 @@ --- title: "Querying Sequence Files" -date: +date: 2016-11-21 22:14:46 UTC parent: "Querying a File System" ---