Hi,

I am evaluating Drill for requirement to query the HDFS cluster where the data is stored in parquet file format. I was able to setup a Drill cluster of 3 Nodes with zookeeper after following some links. On the storage plugin I setup the hdfs with connection to my hdfs URL and can successfully write SQL query in drill web UI and get the results but this on gets data of 1 node only.

I now have some basics questions-
1. Does the storage plugin needs to point to master node of HDFS cluster?
2. Once a SQL query is fired will it fetch data from all nodes in the cluster or just one node?     OR I have to setup the drill on yarn (https://drill.apache.org/docs/drill-on-yarn-introduction/ <https://drill.apache.org/docs/drill-on-yarn-introduction/>) to get result from all nodes? 3. My requirement is to use JDBC to query the HDFS cluster (the search data can go large) in real time  and display result in web UI, do let me know if Drill will be a
    good fit for this use case
4. Do we have any performance bench marks of Drill against Presto and Impala?

Thanks in advance,
Sidd








Reply via email to