[this post is available online at https://s.apache.org/QyEK ]

by Charles Givre

Let me start out by saying that I am not a developer. I do have a technical 
background, but I hadn't coded in Java for at least 10 years before I got 
involved in the Apache Drill project. One has to wonder how, as a 
non-developer, I ended up as a committer for the Drill project. In this blog 
post, I'd like to share with you how I came to be involved with the Drill 

But first, why Drill?

I first heard about Drill at an industry conference several years ago. I was 
speaking with Dr. Ellen Friedman about some data issues we were having and she 
casually mentioned have I tried Drill? I had not heard of it at that point, so 
I did some research and it seemed as if Drill could solve a lot of problems 
that my clients were having. But then, I tried using it and kept getting stuck. 

If you aren't familiar with Apache Drill, Drill is an SQL engine which allows 
you to query any kind of self-describing data. After experimenting with Drill 
for a while, I was impressed enough to thing that the tool had major potential 
in security. One of the biggest problems that Drill solves is the need to 
Extract, Transform, Load (ETL) data into an analytic tool before actually doing 
analysis of that data. This ETL process adds no value to anything really, and 
costs large enterprises literally millions of dollars as well as adding 
unnecessary delays between the time data is ingested and when the data is 
actually available for analysis. In security applications, this delay directly 
translates into risk. The longer it takes to make your data available, the more 
time it will take to potentially find malicious activity and hence, more risk. 
Therefore, if you're able to query the data without having to do any kind of 
ETL or ingestion, you are lowering your risk as well as potentially saving 
millions of dollars.

Getting Involved

Unfortunately, when I started using Drill, I saw this potential, but I couldn't 
get it to work. My next step from here was to try to get assistance at my 
company. I pitched the ideas to my company leadership, but it proved very 
difficult to get the company to pull Java developers from revenue generating 
projects to work on this "pie-in-the-sky", unproven project. After spending 
several months on this, I got really frustrated and decided that I was going to 
try to do it myself, however, I really had no idea what I was doing. I hadn't 
coded in Java for at least 10 years at the time, and had zero experience with 
all the modern Java development tools such as Maven and Git. What I did have 
was persistence, so I started asking for help and decided that I was going to 
dive right in and start adding the functionality that I felt Drill needed to be 
useful in security applications. I started working on something that someone 
else started—the HTTPD format plugin for Drill. Most of the coding was done, 
but there was still enough there for me to get my hands dirty and start 
figuring things out.

What I learned

I still would not consider myself a developer, but after getting that 
particular item committed to the codebase, I learned a lot about how open 
source projects actually work as well as writing production quality code. Since 
then, I've tried to add at least one bit of new functionality to each Drill 
release. I would encourage anyone who is interested in contributing to an Open 
Source project at the Apache Software Foundation, to dive right in, and start. 
There are still a lot of ideas I have for Drill, and with time, I hope to have 
the time to see them through to implementation.

In conclusion, I'm fairly certain that my involvement with Drill and the Apache 
Software Foundation is really just beginning. I'm currently working on the 
O'Reilly book about Apache Drill with a fellow Drill committer. It is my hope 
that the book will spark additional interest in Apache Drill. Open Source 
software is at the heart of the ongoing data revolution which is dramatically 
expanding what is possible with data. I firmly believe that Apache Drill will 
have a role to play in this data revolution and I'm honored to have the 
opportunity to play a small role in developing Drill.

Charles Givre CISSP is a Lead Data Scientist at Deutsche Bank where he works in 
the Chief Information Security Office (CISO). Mr. Givre is an active data 
science instructor and regularly teaches classes about data science and 
security at various industry conferences, such as BlackHat. Mr. Givre is a 
committer for the Apache Drill project and together with Mr. Paul Rogers, is 
working on the forthcoming O’Reilly book about Apache Drill. He can be reached 
at cgivre(at)apache(dot)org.  

= = =

"Success at Apache" is a monthly blog series that focuses on the processes 
behind why the ASF "just works" 

# # #

NOTE: you are receiving this message because you are subscribed to the 
announce@apache.org distribution list. To unsubscribe, send email from the 
recipient account to announce-unsubscr...@apache.org with the word 
"Unsubscribe" in the subject line. 

Reply via email to