I don't really know anything about Hadoop encryption, so I will not address question 1.
2) The "filesystem" storage in drill uses the Hadoop Filesystem API. The filesystem type is configured as part of the storage plugin configuration, in the "connection" field. When executing a query against any "filesystem" storage, drill uses the getBlockLocation() method the Filesystem api to get a lost of blocks along with the locations of each block. It uses this information to assign fragments to the drillbits. Within each fragment, the filesystem api is used to read the data from the filesystem. I'm not sure how the getBlockLocations() method is implemented for the s3 filesystem, but I believe it splits the file based on some configuration property for blocksize. I am not sure what locations are returned for the block locations. 3) I haven't tried this, but if there is a filesystem implementation for s3 and s3n, then they should both work with drill. On Thu, Feb 5, 2015 at 10:42 PM, Derek Rabindran <[email protected]> wrote: > Hi, > > My use case involves using Drill in combination with S3. I have a few > questions: > > 1) Is it possible to decrypt the files before processing? My files are > client-side encrypted. I'm able to provide the master key, however, I'm > not sure at which level this should be configured. > > 2) What is Hadoops role when using Drill with S3? Can you outline the > details of what's actually happening when we execute a drill request on > files residing in S3? > > 3) Will this work for both S3 and S3n? > > Thanks > -- Steven Phillips Software Engineer mapr.com
