Re: Query HDFS

Timothy Chen Mon, 21 Oct 2013 14:15:51 -0700

Nvm, some reason I didn't catch the <namenode host ip> :)

I'll try this out with the AMPlab data set.


Tim


On Mon, Oct 21, 2013 at 2:12 PM, Timothy Chen <[email protected]> wrote:

> So does Drill try to contact HDFS through localhost then?
>
> I would imagine it needs to know the namenode location to start the HDFS
> connection.
>
> Tim
>
>
> On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips 
> <[email protected]>wrote:
>
>> This is configured as part of the storage engine. For example, if you are
>> submitting a physical plan directly, you would set the dfsName property
>> to:
>> hdfs://<namenode host:ip>/
>>
>> If submitting a sql query through sqlline, you should modify the
>> storage-engines.json in the conf directory. For example, modify the
>> "parquet" config to this:
>>
>> "parquet" :
>>       {
>>         "type":"parquet",
>>         "dfsName" : "hdfs://<namenode host:ip>/"
>>       }
>>
>>
>> On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[email protected]>
>> wrote:
>>
>> > Hi,
>> >
>> > I'm also interested in querying data residing in HDFS.  Grateful for any
>> > advice on how to achieve this.
>> >
>> > Thanks,
>> >
>> > Tom
>> >
>> >
>> >
>> > On 18 October 2013 00:10, Timothy Chen <[email protected]> wrote:
>> >
>> >> Hey Steven/Jacques,
>> >>
>> >> If I want to query data resides in HDFS, how do I query this in
>> sqlline?
>> >>
>> >> And how do I specify which HDFS namenode it should connect to for data?
>> >>
>> >> Since I got Drill deployable to EC2, I'm currently thinking to hook the
>> >> AMPLabs Benchmark dataset and see how we perform, and it needs to copy
>> the
>> >> dataset from s3 to a distributed file system first as one node won't
>> able
>> >> to contain it.
>> >>
>> >> Thanks!
>> >>
>> >> Tim
>> >>
>> >
>> >
>>
>
>

Re: Query HDFS

Reply via email to