Re: Query HDFS

Timothy Chen Mon, 21 Oct 2013 14:13:30 -0700

So does Drill try to contact HDFS through localhost then?

I would imagine it needs to know the namenode location to start the HDFS
connection.


Tim


On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips <[email protected]>wrote:

> This is configured as part of the storage engine. For example, if you are
> submitting a physical plan directly, you would set the dfsName property to:
> hdfs://<namenode host:ip>/
>
> If submitting a sql query through sqlline, you should modify the
> storage-engines.json in the conf directory. For example, modify the
> "parquet" config to this:
>
> "parquet" :
>       {
>         "type":"parquet",
>         "dfsName" : "hdfs://<namenode host:ip>/"
>       }
>
>
> On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[email protected]>
> wrote:
>
> > Hi,
> >
> > I'm also interested in querying data residing in HDFS.  Grateful for any
> > advice on how to achieve this.
> >
> > Thanks,
> >
> > Tom
> >
> >
> >
> > On 18 October 2013 00:10, Timothy Chen <[email protected]> wrote:
> >
> >> Hey Steven/Jacques,
> >>
> >> If I want to query data resides in HDFS, how do I query this in sqlline?
> >>
> >> And how do I specify which HDFS namenode it should connect to for data?
> >>
> >> Since I got Drill deployable to EC2, I'm currently thinking to hook the
> >> AMPLabs Benchmark dataset and see how we perform, and it needs to copy
> the
> >> dataset from s3 to a distributed file system first as one node won't
> able
> >> to contain it.
> >>
> >> Thanks!
> >>
> >> Tim
> >>
> >
> >
>

Re: Query HDFS

Reply via email to