Re: Apache Drill and S3

Steven Phillips Sun, 08 Feb 2015 02:01:17 -0800

I don't really know anything about Hadoop encryption, so I will not address
question 1.

2) The "filesystem" storage in drill uses the Hadoop Filesystem API. The
filesystem type is configured as part of the storage plugin configuration,
in the "connection" field.

When executing a query against any "filesystem" storage, drill uses the
getBlockLocation() method the Filesystem api to get a lost of blocks along
with the locations of each block. It uses this information to assign
fragments to the drillbits. Within each fragment, the filesystem api is
used to read the data from the filesystem.

I'm not sure how the getBlockLocations() method is implemented for the s3
filesystem, but I believe it splits the file based on some configuration
property for blocksize. I am not sure what locations are returned for the
block locations.

3) I haven't tried this, but if there is a filesystem implementation for s3
and s3n, then they should both work with drill.

On Thu, Feb 5, 2015 at 10:42 PM, Derek Rabindran <[email protected]> wrote:

> Hi,
>
> My use case involves using Drill in combination with S3.  I have a few
> questions:
>
> 1) Is it possible to decrypt the files before processing?  My files are
> client-side encrypted.  I'm able to provide the master key, however, I'm
> not sure at which level this should be configured.
>
> 2) What is Hadoops role when using Drill with S3?  Can you outline the
> details of what's actually happening when we execute a drill request on
> files residing in S3?
>
> 3) Will this work for both S3 and S3n?
>
> Thanks
>

-- 
 Steven Phillips
 Software Engineer

 mapr.com

Re: Apache Drill and S3

Reply via email to