[4/4] drill git commit: DRILL-3169 multiple dir

bridgetb Tue, 26 May 2015 17:16:53 -0700

DRILL-3169 multiple dir


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/446d71c2
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/446d71c2
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/446d71c2

Branch: refs/heads/gh-pages
Commit: 446d71c242edf6ed6e65924e1b4089677540f151
Parents: fac8fd4
Author: Kristine Hahn <[email protected]>
Authored: Tue May 26 16:48:37 2015 -0700
Committer: Bridget Bevens <[email protected]>
Committed: Tue May 26 17:14:30 2015 -0700

----------------------------------------------------------------------
 .../030-querying-plain-text-files.md            | 95 ++------------------
 .../040-querying-directories.md                 | 45 ++--------
 2 files changed, 12 insertions(+), 128 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/446d71c2/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
----------------------------------------------------------------------
diff --git 
a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md 
b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
index aeb3543..f79f2b9 100644
--- a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
+++ b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
@@ -194,104 +194,23 @@ times a year in the books that Google scans.
          
+------------------------------------+-------------------+------------+
          5 rows selected (1.175 seconds)
 
-The Drill default storage plugins support common file formats. If you need
-support for some other file format, such as GZ, create a custom storage 
plugin. You can also create a storage plugin to simplify querying files having 
long path names. A workspace name replaces the long path name.
+The Drill default storage plugins support common file formats. 
 
 
-## Create a Storage Plugin
+## Query the GZ File Directly
 
-This example covers how to create and use a storage plugin to simplify queries 
or to query a file type that `dfs` does not specify, GZ in this case. First, 
you create the storage plugin in the Drill Web UI. Next, you connect to the
-file through the plugin to query a file.
+This example covers how to query the GZ file containing the compressed TSV 
data. The GZ file name needs to be renamed to specify the type of delimited 
file, such as CSV or TSV. You add `.tsv` before the `.gz` extension in this 
example.
 
-You can create a storage plugin using the Apache Drill Web UI to query the GZ 
file containing the compressed TSV data.
-
-  1. Create an `ngram` directory on your file system.
-  2. Copy the GZ file `googlebooks-eng-all-5gram-20120701-zo.gz` to the 
`ngram` directory.
-  3. Open the Drill Web UI by navigating to <http://localhost:8047/storage>.   
-     To open the Drill Web UI, the [Drill 
shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/) must still 
be running.
-  4. In New Storage Plugin, type `myplugin`.  
-     ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png)    
-  5. Click **Create**.  
-     The Configuration screen appears.
-  6. Replace null with the following storage plugin definition, except on the 
location line, use the *full* path to your `ngram` directory instead of the 
drilluser's path and give your workspace an arbitrary name, for example, ngram:
-  
-        {
-          "type": "file",
-          "enabled": true,
-          "connection": "file:///",
-          "workspaces": {
-            "ngram": {
-              "location": "/Users/drilluser/ngram",
-              "writable": false,
-              "defaultInputFormat": null
-           }
-         },
-         "formats": {
-           "tsv": {
-             "type": "text",
-             "extensions": [
-               "gz"
-             ],
-             "delimiter": "\t"
-            }
-          }
-        }
-
-  7. Click **Create**.  
-     The success message appears briefly.
-  8. Click **Back**.  
-     The new plugin appears in Enabled Storage Plugins.  
-     ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png) 
-  9. Go back to the Drill shell, and list the storage plugins.  
-          SHOW DATABASES;
-
-          +---------------------+
-          |     SCHEMA_NAME     |
-          +---------------------+
-          | INFORMATION_SCHEMA  |
-          | cp.default          |
-          | dfs.default         |
-          | dfs.root            |
-          | dfs.tmp             |
-          | myplugin.default    |
-          | myplugin.ngram      |
-          | sys                 |
-          +---------------------+
-          8 rows selected (0.105 seconds)
-
-Your custom plugin appears in the list and has two workspaces: the `ngram`
-workspace that you defined and a default workspace.
-
-### Connect to and Query a File
-
-When querying the same data source repeatedly, avoiding long path names is
-important. This exercise demonstrates how to simplify the query. Instead of
-using the full path to the Ngram file, you use dot notation in the FROM
-clause.
-
-``<workspace name>.`<location>```
-
-This syntax assumes you connected to a storage plugin that defines the
-location of the data. To query the data source while you are _not_ connected to
-that storage plugin, include the plugin name:
-
-``<plugin name>.<workspace name>.`<location>```
-
-This exercise shows how to query Ngram data when you are connected to 
`myplugin`.
-
-  1. Connect to the ngram file through the custom storage plugin.  
-     `USE myplugin;`
-  2. Get data about "Zoological Journal of the Linnean" that appears more than 
250 times a year in the books that Google scans. In the FROM clause, instead of 
using the full path to the file as you did in the last exercise, connect to the 
data using the storage plugin workspace name ngram.
+  1. Rename the GZ file `googlebooks-eng-all-5gram-20120701-zo.gz` to 
googlebooks-eng-all-5gram-20120701-zo.tsv.gz.
+  2. Query the renamed GZ file directly to get data about "Zoological Journal 
of the Linnean" that appears more than 250 times a year in the books that 
Google scans. In the FROM clause, instead of using the full path to the file as 
you did in the last exercise, connect to the data using the storage plugin 
workspace name ngram.
   
          SELECT COLUMNS[0], 
                 COLUMNS[1], 
                 COLUMNS[2] 
-         FROM ngram.`/googlebooks-eng-all-5gram-20120701-zo.gz` 
+         FROM 
dfs.`/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv.gz` 
          WHERE ((columns[0] = 'Zoological Journal of the Linnean') 
          AND (columns[2] > 250)) 
          LIMIT 10;
 
-     The five rows of output appear.  
-
-To continue with this example and query multiple files in a directory, see the 
section, ["Example of Querying Multiple Files in a 
Directory"]({{site.baseurl}}/docs/querying-directories/#example-of-querying-multiple-files-in-a-directory).
+     The 5 rows of output appear.  
 

http://git-wip-us.apache.org/repos/asf/drill/blob/446d71c2/_docs/query-data/query-a-file-system/040-querying-directories.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/040-querying-directories.md 
b/_docs/query-data/query-a-file-system/040-querying-directories.md
index 4a5b4ae..88b5b40 100644
--- a/_docs/query-data/query-a-file-system/040-querying-directories.md
+++ b/_docs/query-data/query-a-file-system/040-querying-directories.md
@@ -13,8 +13,8 @@ same structure: `plays.csv` and `moreplays.csv`. The first 
file contains 7
 records and the second file contains 3 records. The following query returns
 the "union" of the two files, ordered by the first column:
 
-    0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play 
-    from dfs.`/Users/brumsby/drill/testdata` order by 1;
+    0: jdbc:drill:zk=local> SELECT COLUMNS[0] AS `Year`, COLUMNS[1] AS Play 
+    FROM dfs.`/Users/brumsby/drill/testdata` order by 1;
  
     +------------+------------------------+
     |    Year    |          Play          |
@@ -49,7 +49,7 @@ You can query all of these files, or a subset, by referencing 
the file system
 once in a Drill query. For example, the following query counts the number of
 records in all of the files inside the `2013` directory:
 
-    0: jdbc:drill:> select count(*) from 
MFS.`/mapr/drilldemo/labs/clicks/logs/2013` ;
+    0: jdbc:drill:> SELECT COUNT(*) FROM 
MFS.`/mapr/drilldemo/labs/clicks/logs/2013` ;
     +------------+
     |   EXPR$0   |
     +------------+
@@ -64,7 +64,7 @@ subdirectories: `2012`, `2013`, and `2014`. The following 
query constrains
 files inside the subdirectory named `2013`. The variable `dir0` refers to the
 first level down from logs, `dir1` to the next level, and so on.
 
-    0: jdbc:drill:> use bob.logdata;
+    0: jdbc:drill:> USE bob.logdata;
     +------------+-----------------------------------------+
     |     ok     |              summary                    |
     +------------+-----------------------------------------+
@@ -72,7 +72,7 @@ first level down from logs, `dir1` to the next level, and so 
on.
     +------------+-----------------------------------------+
     1 row selected (0.305 seconds)
  
-    0: jdbc:drill:> select * from logs where dir0='2013' limit 10;
+    0: jdbc:drill:> SELECT * FROM logs WHERE dir0='2013' LIMIT 10;
     
+------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
     |    dir0    |    dir1    |  trans_id  |    date    |    time    |  
cust_id   |   device   |   state    |  camp_id   |  keywords   |
     
+------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
@@ -89,38 +89,3 @@ first level down from logs, `dir1` to the next level, and so 
on.
     
+------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
     10 rows selected (0.583 seconds)
 
-## Example of Querying Multiple Files in a Directory
-
-This example is a continuation of the example in the section, ["Example of 
Querying a TSV 
File"]({{site.baseurl}}/docs/querying-plain-text-files/#example-of-querying-a-tsv-file)
 that creates a subdirectory in the `ngram` directory and [custom plugin 
workspace]({{site.baseurl}}/docs/querying-plain-text-files/#create-a-storage-plugin)
 you created earlier.
-
-You download a second Ngram file. Next, you
-move both Ngram GZ files you downloaded to the `ngram` subdirectory. Finally, 
using the custom
-plugin workspace, you query both files. In the FROM clause, simply reference
-the subdirectory.
-
-  1. Download a second file of compressed Google Ngram data from this 
location: 
-  
-     
http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-2gram-20120701-ze.gz
-  2. Move `googlebooks-eng-all-2gram-20120701-ze.gz` to the `ngram/myfiles` 
subdirectory. 
-  3. Move the 5gram file you downloaded earlier 
`googlebooks-eng-all-5gram-20120701-zo.gz` to the `ngram/myfiles` subdirectory.
-  4. In the Drill shell, use the `myplugin.ngrams` workspace. 
-   
-          USE myplugin.ngram;
-  5. Query the myfiles directory for the "Zoological Journal of the Linnean" 
or "zero temperatures" in books published in 1998.
-  
-          SELECT * 
-          FROM myfiles 
-          WHERE (((COLUMNS[0] = 'Zoological Journal of the Linnean')
-            OR (COLUMNS[0] = 'zero temperatures')) 
-            AND (COLUMNS[1] = '1998'));
-The output lists ngrams from both files.
-
-          +----------------------------------------------------------+
-          |                         columns                          |
-          +----------------------------------------------------------+
-          | ["Zoological Journal of the Linnean","1998","157","53"]  |
-          | ["zero temperatures","1998","628","487"]                 |
-          +----------------------------------------------------------+
-          2 rows selected (7.007 seconds)
-
-For more information about querying directories, see the section, ["Query 
Directory Functions"]({{site.baseurl}}/docs/query-directory-functions).
\ No newline at end of file

[4/4] drill git commit: DRILL-3169 multiple dir

Reply via email to