Nvm, some reason I didn't catch the <namenode host ip> :) I'll try this out with the AMPlab data set.
Tim On Mon, Oct 21, 2013 at 2:12 PM, Timothy Chen <[email protected]> wrote: > So does Drill try to contact HDFS through localhost then? > > I would imagine it needs to know the namenode location to start the HDFS > connection. > > Tim > > > On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips > <[email protected]>wrote: > >> This is configured as part of the storage engine. For example, if you are >> submitting a physical plan directly, you would set the dfsName property >> to: >> hdfs://<namenode host:ip>/ >> >> If submitting a sql query through sqlline, you should modify the >> storage-engines.json in the conf directory. For example, modify the >> "parquet" config to this: >> >> "parquet" : >> { >> "type":"parquet", >> "dfsName" : "hdfs://<namenode host:ip>/" >> } >> >> >> On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[email protected]> >> wrote: >> >> > Hi, >> > >> > I'm also interested in querying data residing in HDFS. Grateful for any >> > advice on how to achieve this. >> > >> > Thanks, >> > >> > Tom >> > >> > >> > >> > On 18 October 2013 00:10, Timothy Chen <[email protected]> wrote: >> > >> >> Hey Steven/Jacques, >> >> >> >> If I want to query data resides in HDFS, how do I query this in >> sqlline? >> >> >> >> And how do I specify which HDFS namenode it should connect to for data? >> >> >> >> Since I got Drill deployable to EC2, I'm currently thinking to hook the >> >> AMPLabs Benchmark dataset and see how we perform, and it needs to copy >> the >> >> dataset from s3 to a distributed file system first as one node won't >> able >> >> to contain it. >> >> >> >> Thanks! >> >> >> >> Tim >> >> >> > >> > >> > >
