This is an automated email from the ASF dual-hosted git repository.

mmiller pushed a commit to branch master
in repository

The following commit(s) were added to refs/heads/master by this push:
     new b9b8aea  Blog post to configure Accumulo with Azure Data Lake Gen2 
Storage (#198)
b9b8aea is described below

commit b9b8aea71bf9b4ddbb697310a50be4768eb1d3bf
Author: Karthick Narendran <>
AuthorDate: Thu Oct 17 15:55:00 2019 +0100

    Blog post to configure Accumulo with Azure Data Lake Gen2 Storage (#198)
 _posts/blog/ | 133 ++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/_posts/blog/ 
new file mode 100644
index 0000000..03288aa
--- /dev/null
+++ b/_posts/blog/
@@ -0,0 +1,133 @@
+title: "Using Azure Data Lake Gen2 storage as a data store for Accumulo"
+author: Karthick Narendran
+Accumulo can store its files in [Azure Data Lake Storage 
+using the [ABFS (Azure Blob File 
+Similar to [S3 
+the write ahead logs & Accumulo metadata can be stored in HDFS and everything 
else on Gen2 storage
+using the volume chooser feature introduced in Accumulo 2.0. The 
configurations referred on this blog
+are specific to Accumulo 2.0 and Hadoop 3.2.0.
+## Hadoop setup
+For ABFS client to talk to Gen2 storage, it requires one of the Authentication 
mechanism listed 
+This post covers [Azure Managed 
+formerly known as Managed Service Identity or MSI. This feature provides Azure 
services with an 
+automatically managed identity in [Azure 
+and it avoids the need for credentials or other sensitive information from 
being stored in code 
+or configs/JCEKS. Plus, it comes free with Azure AD.  
+At least the following should be added to Hadoop's `core-site.xml` on each 
+  <name></name>
+  <value>OAuth</value>
+  <name></name>
+  <value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>
+  <name></name>
+  <value>TenantID</value>
+  <name></name>
+  <value>ClientID</value>
+See [ABFS doc](
+for more information on Hadoop Azure support.
+To get hadoop command to work with ADLS Gen2 set the 
+following entries in ``. As Gen2 storage is TLS enabled by 
+it is important we use the native OpenSSL implementation of TLS.
+export HADOOP_OPTIONAL_TOOLS="hadoop-azure"
+export HADOOP_OPTS="-Dorg.wildfly.openssl.path=<path/to/OpenSSL/libraries> 
+To verify the location of the OpenSSL libraries, run `whereis libssl` command 
+on the host
+## Accumulo setup
+For each node in the cluster, modify `` to add Azure storage 
jars to the
+classpath.  Your versions may differ depending on your Hadoop version,
+following versions were included with Hadoop 3.2.0.
+Tried adding `-Dorg.wildfly.openssl.path` to `JAVA_OPTS` in ``, 
but it 
+did not appear to work, this needs further investigation.
+Set the following in `` and then run `accumulo init`, but 
don't start Accumulo.
+instance.volumes=hdfs://<name node>/accumulo
+After running Accumulo init we need to configure storing write ahead logs in
+HDFS.  Set the following in ``.
+Run `accumulo init --add-volumes` to initialize the Azure DLS Gen2 volume.  
Doing this
+in two steps avoids putting any Accumulo metadata files in Gen2  during init.
+Copy `` to all nodes and start Accumulo.
+Individual tables can be configured to store their files in HDFS by setting the
+table property `table.custom.volume.preferred`.  This should be set for the
+metadata table in case it splits using the following Accumulo shell command.
+config -t accumulo.metadata -s 
+## Accumulo example
+The following Accumulo shell session shows an example of writing data to Gen2 
+reading it back.  It also shows scanning the metadata table to verify the data
+is stored in Gen2.
+root@muchos> createtable gen2test
+root@muchos gen2test> insert r1 f1 q1 v1
+root@muchos gen2test> insert r1 f1 q2 v2
+root@muchos gen2test> flush -w
+2019-10-16 08:01:00,564 [shell.Shell] INFO : Flush of table gen2test  
+root@muchos gen2test> scan
+r1 f1:q1 []    v1
+r1 f1:q2 []    v2
+root@muchos gen2test> scan -t accumulo.metadata -c file
 []    234,2
+These instructions will help to configure Accumulo to use Azure's Data Lake 
Gen2 Storage along with HDFS. With this setup, 
+we are able to successfully run the continuos ingest test. Going forward, 
we'll experiment more on this space 
+with ADLS Gen2 and add/update blog as we come along.

Reply via email to