So does Drill try to contact HDFS through localhost then? I would imagine it needs to know the namenode location to start the HDFS connection.
Tim On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips <[email protected]>wrote: > This is configured as part of the storage engine. For example, if you are > submitting a physical plan directly, you would set the dfsName property to: > hdfs://<namenode host:ip>/ > > If submitting a sql query through sqlline, you should modify the > storage-engines.json in the conf directory. For example, modify the > "parquet" config to this: > > "parquet" : > { > "type":"parquet", > "dfsName" : "hdfs://<namenode host:ip>/" > } > > > On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[email protected]> > wrote: > > > Hi, > > > > I'm also interested in querying data residing in HDFS. Grateful for any > > advice on how to achieve this. > > > > Thanks, > > > > Tom > > > > > > > > On 18 October 2013 00:10, Timothy Chen <[email protected]> wrote: > > > >> Hey Steven/Jacques, > >> > >> If I want to query data resides in HDFS, how do I query this in sqlline? > >> > >> And how do I specify which HDFS namenode it should connect to for data? > >> > >> Since I got Drill deployable to EC2, I'm currently thinking to hook the > >> AMPLabs Benchmark dataset and see how we perform, and it needs to copy > the > >> dataset from s3 to a distributed file system first as one node won't > able > >> to contain it. > >> > >> Thanks! > >> > >> Tim > >> > > > > >
