You might also try querying data in s3 as well, by using the s3 uri.
On Mon, Oct 21, 2013 at 2:14 PM, Timothy Chen <[email protected]> wrote: > Nvm, some reason I didn't catch the <namenode host ip> :) > > I'll try this out with the AMPlab data set. > > Tim > > > On Mon, Oct 21, 2013 at 2:12 PM, Timothy Chen <[email protected]> wrote: > > > So does Drill try to contact HDFS through localhost then? > > > > I would imagine it needs to know the namenode location to start the HDFS > > connection. > > > > Tim > > > > > > On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips <[email protected] > >wrote: > > > >> This is configured as part of the storage engine. For example, if you > are > >> submitting a physical plan directly, you would set the dfsName property > >> to: > >> hdfs://<namenode host:ip>/ > >> > >> If submitting a sql query through sqlline, you should modify the > >> storage-engines.json in the conf directory. For example, modify the > >> "parquet" config to this: > >> > >> "parquet" : > >> { > >> "type":"parquet", > >> "dfsName" : "hdfs://<namenode host:ip>/" > >> } > >> > >> > >> On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[email protected]> > >> wrote: > >> > >> > Hi, > >> > > >> > I'm also interested in querying data residing in HDFS. Grateful for > any > >> > advice on how to achieve this. > >> > > >> > Thanks, > >> > > >> > Tom > >> > > >> > > >> > > >> > On 18 October 2013 00:10, Timothy Chen <[email protected]> wrote: > >> > > >> >> Hey Steven/Jacques, > >> >> > >> >> If I want to query data resides in HDFS, how do I query this in > >> sqlline? > >> >> > >> >> And how do I specify which HDFS namenode it should connect to for > data? > >> >> > >> >> Since I got Drill deployable to EC2, I'm currently thinking to hook > the > >> >> AMPLabs Benchmark dataset and see how we perform, and it needs to > copy > >> the > >> >> dataset from s3 to a distributed file system first as one node won't > >> able > >> >> to contain it. > >> >> > >> >> Thanks! > >> >> > >> >> Tim > >> >> > >> > > >> > > >> > > > > >
