[1/2] drill-site git commit: edit files for JJ transform in DITA

bridgetb Thu, 08 Feb 2018 16:23:50 -0800

Repository: drill-site
Updated Branches:
  refs/heads/asf-site 36b5428e1 -> edc3f206c



http://git-wip-us.apache.org/repos/asf/drill-site/blob/edc3f206/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index af7a909..fc2708d 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,11 +6,94 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 07 Feb 2018 18:35:47 -0800</pubDate>
-    <lastBuildDate>Wed, 07 Feb 2018 18:35:47 -0800</lastBuildDate>
+    <pubDate>Thu, 08 Feb 2018 16:20:28 -0800</pubDate>
+    <lastBuildDate>Thu, 08 Feb 2018 16:20:28 -0800</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>
+        <title>Running SQL Queries on Amazon S3</title>
+        <description>&lt;p&gt;The functionality and sheer usefulness of Drill 
is growing fast.  If you&amp;#39;re a user of some of the popular BI tools out 
there like Tableau or SAP Lumira, now is a good time to take a look at how 
Drill can make your life easier, especially if  you&amp;#39;re faced with the 
task of quickly getting a handle on large sets of unstructured data.  With 
schema generated on the fly, you can save a lot of time and headaches by 
running SQL queries on the data where it rests without knowing much about 
columns or formats.  There&amp;#39;s even more good news:  Drill also works 
with data stored in the cloud.  With a few simple steps, you can configure the 
S3 storage plugin for Drill and be off to the races running queries.  In this 
post we&amp;#39;ll look at how to configure Drill to access data stored in an 
S3 bucket.&lt;/p&gt;
+
+&lt;p&gt;If you&amp;#39;re more of a visual person, you can skip this article 
entirely and &lt;a 
href=&quot;https://www.youtube.com/watch?v=w8gZ2nn_ZUQ&quot;&gt;go straight to 
a video&lt;/a&gt; I put together that walks through an end-to-end example with 
Tableau.  This example is easily extended to other BI tools, as the steps are 
identical on the Drill side.&lt;/p&gt;
+
+&lt;p&gt;At a high level, configuring Drill to access S3 bucket data is 
accomplished with the following steps on each node running a drillbit.&lt;/p&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Download and install the &lt;a 
href=&quot;http://www.jets3t.org/&quot;&gt;JetS3t&lt;/a&gt; JAR files and 
enable them.&lt;/li&gt;
+&lt;li&gt;Add your S3 credentials in the relevant XML configuration 
file.&lt;/li&gt;
+&lt;li&gt;Configure and enable the S3 storage plugin through the Drill web 
interface.&lt;/li&gt;
+&lt;li&gt;Connect your BI tool of choice and query away.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Consult the &lt;a 
href=&quot;https://cwiki.apache.org/confluence/display/DRILL/Architectural+Overview&quot;&gt;Architectural
 Overview&lt;/a&gt; for a refresher on the architecture of Drill.&lt;/p&gt;
+
+&lt;h2 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h2&gt;
+
+&lt;p&gt;These steps assume you have a &lt;a 
href=&quot;https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes&quot;&gt;typical
 Drill cluster and ZooKeeper quorum&lt;/a&gt; configured and running.  To 
access data in S3, you will need an S3 bucket configured and have the required 
Amazon security credentials in your possession.  An &lt;a 
href=&quot;http://blogs.aws.amazon.com/security/post/Tx1R9KDN9ISZ0HF/Where-s-my-secret-access-key&quot;&gt;Amazon
 blog post&lt;/a&gt; has more information on how to get these from your 
account.&lt;/p&gt;
+
+&lt;h2 id=&quot;configuration-steps&quot;&gt;Configuration Steps&lt;/h2&gt;
+
+&lt;p&gt;To connect Drill to S3, all of the drillbit nodes will need to access 
code in the JetS3t library developed by Amazon.  As of this writing, 0.9.2 is 
the latest version but you might want to check &lt;a 
href=&quot;https://jets3t.s3.amazonaws.com/toolkit/toolkit.html&quot;&gt;the 
main page&lt;/a&gt; to see if anything has been updated.  Be sure to get 
version 0.9.2 or later as earlier versions have a bug relating to reading 
Parquet data.&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;wget 
http://bitbucket.org/jmurty/jets3t/downloads/jets3t-0.9.2.zip
+cp jets3t-0.9.2/jars/jets3t-0.9.2.jar &lt;span 
class=&quot;nv&quot;&gt;$DRILL_HOME&lt;/span&gt;/jars/3rdparty
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;Next, enable the plugin by editing the file:&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span 
class=&quot;nv&quot;&gt;$DRILL_HOME&lt;/span&gt;/bin/hadoop_excludes.txt
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;and removing the line &lt;code&gt;jets3t&lt;/code&gt;.&lt;/p&gt;
+
+&lt;p&gt;Drill will need to know your S3 credentials in order to access data 
there. These credentials will need to be placed in the core-site.xml file for 
your installation.  If you already have a core-site.xml file configured for 
your environment, add the following parameters to it, otherwise create the file 
from scratch.  If you do end up creating it from scratch you will need to wrap 
these parameters with &lt;code&gt;&amp;lt;configuration&amp;gt;&lt;/code&gt; 
and &lt;code&gt;&amp;lt;/configuration&amp;gt;&lt;/code&gt;.&lt;/p&gt;
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-xml&quot; data-lang=&quot;xml&quot;&gt;&lt;span 
class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3.awsAccessKeyId&lt;span
 class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;ID&lt;span 
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3.awsSecretAccessKey&lt;span
 class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;SECRET&lt;span 
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3n.awsAccessKeyId&lt;span
 class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;ID&lt;span 
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;name&amp;gt;&lt;/span&gt;fs.s3n.awsSecretAccessKey&lt;span
 class=&quot;nt&quot;&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
+  &lt;span 
class=&quot;nt&quot;&gt;&amp;lt;value&amp;gt;&lt;/span&gt;SECRET&lt;span 
class=&quot;nt&quot;&gt;&amp;lt;/value&amp;gt;&lt;/span&gt;
+&lt;span class=&quot;nt&quot;&gt;&amp;lt;/property&amp;gt;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;The steps so far give Drill enough information to connect to the S3 
service.  Remember, you have to do this on all the nodes running 
drillbit.&lt;/p&gt;
+
+&lt;p&gt;Next, let&amp;#39;s go into the Drill web interface and enable the S3 
storage plugin.  In this case you only need to connect to 
&lt;strong&gt;one&lt;/strong&gt; of the nodes because Drill&amp;#39;s 
configuration is synchronized across the cluster.  Complete the following 
steps:&lt;/p&gt;
+
+&lt;ol&gt;
+&lt;li&gt;Point your browser to 
&lt;code&gt;http://&amp;lt;host&amp;gt;:8047&lt;/code&gt;&lt;/li&gt;
+&lt;li&gt;Select the &amp;#39;Storage&amp;#39; tab.&lt;/li&gt;
+&lt;li&gt;A good starting configuration for S3 can be entirely the same as the 
&lt;code&gt;dfs&lt;/code&gt; plugin, except the connection parameter is changed 
to &lt;code&gt;s3://bucket&lt;/code&gt;.  So first select the 
&lt;code&gt;Update&lt;/code&gt; button for &lt;code&gt;dfs&lt;/code&gt;, then 
select the text area and copy it into the clipboard (on Windows, ctrl-A, ctrl-C 
works).&lt;/li&gt;
+&lt;li&gt;Press &lt;code&gt;Back&lt;/code&gt;, then create a new plugin by 
typing the name into the &lt;code&gt;New Storage Plugin&lt;/code&gt;, then 
press &lt;code&gt;Create&lt;/code&gt;.  You can choose any name, but a good 
convention is to use &lt;code&gt;s3-&amp;lt;bucketname&amp;gt;&lt;/code&gt; so 
you can easily identify it later.&lt;/li&gt;
+&lt;li&gt;In the configuration area, paste the configuration you just grabbed 
from &amp;#39;dfs&amp;#39;.  Change the line &lt;code&gt;connection: 
&amp;quot;file:///&amp;quot;&lt;/code&gt; to &lt;code&gt;connection: 
&amp;quot;s3://&amp;lt;bucket&amp;gt;&amp;quot;&lt;/code&gt;.&lt;/li&gt;
+&lt;li&gt;Click &lt;code&gt;Update&lt;/code&gt;.  You should see a message 
that indicates success.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;At this point you can run queries on the data directly and you have a 
couple of options on how you want to access it.  You can use Drill Explorer and 
create a custom view (based on an SQL query) that you can then access in 
Tableau or other BI tools, or just use Drill directly from within the 
tool.&lt;/p&gt;
+
+&lt;p&gt;You may want to check out the &lt;a 
href=&quot;http://www.youtube.com/watch?v=jNUsprJNQUg&quot;&gt;Tableau 
demo&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;With just a few lines of configuration, you&amp;#39;ve just opened 
the vast world of data available in the Amazon cloud and reduced the amount of 
work you have to do in advance to access data stored there with SQL.  There are 
even some &lt;a href=&quot;https://aws.amazon.com/datasets&quot;&gt;public 
datasets&lt;/a&gt; available directly on S3 that are great for 
experimentation.&lt;/p&gt;
+
+&lt;p&gt;Happy Drilling!&lt;/p&gt;
+</description>
+        <pubDate>Fri, 09 Feb 2018 00:16:07 -0000</pubDate>
+        <link>/blog/2018/02/09/running-sql-queries-on-amazon-s3/</link>
+        <guid 
isPermaLink="true">/blog/2018/02/09/running-sql-queries-on-amazon-s3/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Drill 1.12 Released</title>
         <description>&lt;p&gt;Today, we&amp;#39;re happy to announce the 
availability of Drill 1.12.0. You can download it &lt;a 
href=&quot;https://drill.apache.org/download/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
 
@@ -398,74 +481,5 @@ exist. Instead, Drill now returns 
&lt;code&gt;null&lt;/code&gt; values for that
         
       </item>
     
-      <item>
-        <title>Drill 1.3 Released</title>
-        <description>&lt;p&gt;Today I&amp;#39;m happy to announce the 
availability of the Drill 1.3 release. This release addresses &lt;a 
href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&amp;amp;version=12332946&quot;&gt;58
 JIRAs&lt;/a&gt; on top of the 1.2 release. Highlights include:&lt;/p&gt;
-
-&lt;h2 id=&quot;enhanced-amazon-s3-support&quot;&gt;Enhanced Amazon S3 
Support&lt;/h2&gt;
-
-&lt;p&gt;Drill 1.3 utilizes a new library, called s3a, for reading data from 
S3. The s3a library includes improvements over the previous s3n library, such 
as higher performance and the ability to read large files (over 5GB).&lt;/p&gt;
-
-&lt;p&gt;In addition to the new s3a library, Drill 1.3 makes it easier to set 
up your AWS credentials. Simply edit the file 
&lt;code&gt;conf/core-site.xml&lt;/code&gt; in the Drill install directory. For 
more information, check out the &lt;a 
href=&quot;/docs/s3-storage-plugin/&quot;&gt;step-by-step 
instructions&lt;/a&gt; in the documentation.&lt;/p&gt;
-
-&lt;h2 id=&quot;heterogeneous-types&quot;&gt;Heterogeneous Types&lt;/h2&gt;
-
-&lt;p&gt;Drill 1.3 includes support for mixed-type columns, often found in 
systems like MongoDB and file formats like JSON. For example, Drill can now 
columns that evolve from one data type to another over time.&lt;/p&gt;
-
-&lt;p&gt;Drill 1.3 provides a collection of functions that enable you to test 
the data type of a value. For example, if you have a column that has both lists 
(arrays) and numbers, you can use the following query to extract the first 
element from the array values:&lt;/p&gt;
-
-&lt;p&gt;&lt;code&gt;SELECT 1 + CASE WHEN is_list(a) THEN a[0] ELSE a END FROM 
table;&lt;/code&gt;&lt;/p&gt;
-
-&lt;h2 id=&quot;text-file-headers&quot;&gt;Text File Headers&lt;/h2&gt;
-
-&lt;p&gt;Drill is now able to parse the header row in a text file (CSV, TSV, 
etc.). Prior to Drill 1.3, data had to be accessed through the 
&lt;code&gt;columns&lt;/code&gt; array:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;SELECT 
columns[0], columns[1] FROM dfs.`/path/to/users.csv`
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;With Drill 1.3, you can use the actual column names in the CSV 
file:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;SELECT name, 
address FROM dfs.`/path/to/users.csv`
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Enabling header parsing is as simple as setting the 
&lt;code&gt;extractHeader&lt;/code&gt; parameter in the storage plugin 
configuration for the desired file extensions. For more information, check out 
&lt;a href=&quot;/docs/text-files-csv-tsv-psv/&quot;&gt;the 
documentation&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;sequence-files&quot;&gt;Sequence Files&lt;/h2&gt;
-
-&lt;p&gt;Drill now &lt;a 
href=&quot;/docs/querying-sequence-files/&quot;&gt;supports sequence 
files&lt;/a&gt;, a format commonly used in the Hadoop ecosystem. A sequence 
file contains a series of keys and values, and querying it with Drill is as 
easy as querying any other self-describing format:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;SELECT *
-FROM dfs.tmp.`simple.seq`
-LIMIT 1;
-+--------------+---------------+
-|  binary_key  | binary_value  |
-+--------------+---------------+
-| [B@70828f46  | [B@b8c765f    |
-+--------------+---------------+
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Drill&amp;#39;s &lt;code&gt;CONVERT_FROM&lt;/code&gt; function makes 
it easy to decode the binary values:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code 
class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;SELECT 
CONVERT_FROM(binary_key, &amp;#39;UTF8&amp;#39;), CONVERT_FROM(binary_value, 
&amp;#39;UTF8&amp;#39;)
-FROM dfs.tmp.`simple.seq`
-LIMIT 1
-;
-+-----------+-------------+
-|  EXPR$0   |   EXPR$1    |
-+-----------+-------------+
-| key0      |   value0    |
-+-----------+-------------+
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;many-more-fixes&quot;&gt;Many More Fixes&lt;/h2&gt;
-
-&lt;p&gt;Drill 1.3 includes many other improvements, including enhancements 
related to querying Hive tables, MongoDB collections and Avro files. Check out 
the complete list of &lt;a 
href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&amp;amp;version=12332946&quot;&gt;fixes
 and enhancements&lt;/a&gt; for more information.&lt;/p&gt;
-
-&lt;p&gt;Download the &lt;a 
href=&quot;https://drill.apache.org/download/&quot;&gt;Drill 1.3 
release&lt;/a&gt; now and let us know your thoughts.&lt;/p&gt;
-
-&lt;p&gt;Drill On!
-Jacques Nadeau&lt;/p&gt;
-</description>
-        <pubDate>Mon, 23 Nov 2015 00:00:00 -0800</pubDate>
-        <link>/blog/2015/11/23/drill-1.3-released/</link>
-        <guid isPermaLink="true">/blog/2015/11/23/drill-1.3-released/</guid>
-        
-        
-        <category>blog</category>
-        
-      </item>
-    
   </channel>
 </rss>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/edc3f206/index.html
----------------------------------------------------------------------
diff --git a/index.html b/index.html
index 2543a6c..637a409 100644
--- a/index.html
+++ b/index.html
@@ -166,7 +166,7 @@ $(document).ready(function() {
 
 </div><!-- header -->
 <div class="alertbar">
-  <div class="news">News:</div><div><a 
href="/blog/2017/12/15/drill-1.12-released/">Drill 1.12 
Released</a><br/><span>(Bridget Bevens)</span></div><div><a 
href="/blog/2017/07/31/drill-1.11-released/">Drill 1.11 
Released</a><br/><span>(Bridget Bevens)</span></div>
+  <div class="news">News:</div><div><a 
href="/blog/2018/02/09/running-sql-queries-on-amazon-s3/">Running SQL Queries 
on Amazon S3</a><br/><span>(Nick Amato)</span></div><div><a 
href="/blog/2017/12/15/drill-1.12-released/">Drill 1.12 
Released</a><br/><span>(Bridget Bevens)</span></div>
 </div>
 
 <div class="mw introWrapper">

[1/2] drill-site git commit: edit files for JJ transform in DITA

Reply via email to