RE: Distributed SQL on Drill question

Chris Drawater Wed, 22 Oct 2014 17:02:30 -0700

We have now resolved our  JDBC and ODBC connection issues ....
It turns out that although throughout we used IP addresses in all our 
configurations ... somewhere down the line the IP Name of the DrillBit started 
to be used (in red below)...


16:42:21.834 [main-SendThread(10.44.18.101:2181)] DEBUG 
org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x1493348d1c30005, 
packet:: clientPath:null serverPath:null finishe
d:false header:: 9,4  replyHeader:: 9,316,0  request:: 
'/drill/drillbits1/1b8d393d-9802-4e0b-9115-b60e3154829a,F  response:: 
#a2431623864333933642d393830322d346530622d393131352d6236
3065333135343832396110fffffff1fffffff0ffffffa4ffffff9affffff93291a14a66472696c6c3110ffffffa2fffffff2118ffffffa3fffffff2120ffffffa4fffffff21,s{308,308,1413904677021,1413904677021,0,0
,0,92661655187685376,67,0,308}
16:42:22.006 [main] DEBUG i.n.c.MultithreadEventLoopGroup - 
-Dio.netty.eventLoopThreads: 8
16:42:22.021 [main] DEBUG io.netty.channel.nio.NioEventLoop - 
-Dio.netty.noKeySetOptimization: false
16:42:22.021 [main] DEBUG io.netty.channel.nio.NioEventLoop - 
-Dio.netty.selectorAutoRebuildThreshold: 512
16:42:22.053 [main] DEBUG o.a.drill.exec.client.DrillClient - Connecting to 
server drill1:31010
16:42:24.767 [main] DEBUG i.n.util.internal.ThreadLocalRandom - 
-Dio.netty.initialSeedUniquifier: 0x2430146dd102a22c
16:42:24.783 [main] DEBUG i.n.channel.ChannelOutboundBuffer - 
-Dio.netty.threadLocalDirectBufferSize: 65536
16:42:24.783 [main] DEBUG io.netty.util.Recycler - 
-Dio.netty.recycler.maxCapacity.default: 262144
16:42:24.798 [main] DEBUG io.netty.buffer.ByteBufUtil - 
-Dio.netty.allocator.type: unpooled
16:42:24.814 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator - 
Generated: io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher
16:42:24.814 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator - 
Generated: 
io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher
16:42:24.829 [Client-1] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator - 
Generated: 
io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher
16:42:24.845 [Client-1] INFO  o.a.drill.exec.rpc.user.UserClient - Channel 
closed between local null and remote null
SQLException : SQL state: null java.sql.SQLException: Failure while attempting 
to connect to Drill. ErrorCode: 0

The resolution (for our test purposes) was to simply   to pop an  entry in   
C:\Windows\System32\drivers\etc\hosts.

But I'd still appreciate the answers to my questions -->


·         Would we expect to be able to run a distr SQL query by connecting  
via JDBC direct to specific drillbit ?  Seems Not ?

·         Would we expect to be able to run a distr SQL query by connecting  
via ODBC direct to specific drillbit ?

·         Would we expect to be able to run a distr SQL query by connecting  
via JDBC via a zookeeper quorum connection ?

·         Would we expect to be able to run a distr SQL query by connecting  
via ODBC via a zookeeper quorum connection ?

·         How can we identify the use of multiple nodes due to a distributed  
SQL query    via explain plan output or the JSON QEP ?

Thanks,
    Chris


From: Chris Drawater
Sent: 22 October 2014 10:39
To: '[email protected]'
Subject: Distributed SQL on Drill question


Hi,

We have started to evaluate Apache Drill.

Using  multiple nodes/VMs  we wish to stream JSON data (via Apache Storm) to  
persistent local filesystem (with a consistent dir structure so we can define 
various storage plugins) and then use Drill to run distributed SQL queries 
across these JSON files.
If at all possible we don't wish to install Hadoop HDFS/Hbase/Hive.


SQL execution would be hopefully be via ODBC and/or JDBC.

So far , embedded drill experiments work just great but the data processed is 
local to the sqlline.
Using SQL against JSON works really well.

Unfortunately,  experiments to run distributed SQL queries have (so far) not 
been successful.
We have tried

·         a 3 node VM based system  with a 3 node zookeeper quorum + 1 drillbit 
per node

·         a 3 node VM based system  with a single zookeeper instance (covering 
all 3 nodes)   + again 1 drillbit per node
and 'select * from sys.drillbits' from sqlline outputs all 3 drill bits in both 
 case.   Likewise, zookeeper confirms the existence of the drill cluster.
But we've not managed to run a distributed query.

We have tested against both the 0.5 release and a 0.6 build on 64 bit Ubuntu 
14.04.

So far, we can only connect via ODBC to a specific drillbit.
We cannot get the JDBC driver (using squirrel) to work.
Also ODBC to a zookeeper quorum  doesn't appear to work.  But we have tested 
client access using telnet IP_ADDR 2181  and that's OK.

So my questions are :


·         Would we expect to be able to run a distr SQL query by connecting  
via JDBC direct to specific drillbit ?

·         Would we expect to be able to run a distr SQL query by connecting  
via ODBC direct to specific drillbit ?

·         Would we expect to be able to run a distr SQL query by connecting  
via JDBC via a zookeeper quorum connection ?

·         Would we expect to be able to run a distr SQL query by connecting  
via ODBC via a zookeeper quorum connection ?

·         How can we identify the use of multiple nodes due to a distributed  
SQL query    via explain plan output or the JSON QEP ?

·         Any ideas or issues why we can't connect via the a zookeeper quorum 
connection ?

Any help or insights you can give would be most appreciated.

Thanks.
   Chris

Chris Drawater
Database Architect
[Description: AriesoA-JDSU-Mobility-Solution_logo 300px 
wide]<http://www.arieso.com/>
Office +44 1635 232470  |  Fax +44 1635 232471
Email [email protected]<mailto:[email protected]>  |  Web 
www.arieso.com<http://www.arieso.com/>

RE: Distributed SQL on Drill question

Reply via email to