Hi Chris, Welcome to the drill community!
What are you currently using as a VM environment? Is it running on your own hardware or a cloud provider like AWS or Google Compute Engine? So far drill has required the nodes to be able to communicate using UDP multicast to sync some query information around the cluster for distributed queries. We realized that these providers do not support this level of network communication between nodes, so we have been trying to develop a workaround, but distributed queries will currently fail in these environments. If this is not the case, a look through the logs for more detailed errors would be the most useful for debugging your specific issue. Connecting to a specific node or a zk quorum should be the same in terms of allowing execution of distributed queries, both JDBC and ODBC connections will run distributed queries with locality information taken into account where available. Distributed queries can be identified by exchanges inserted in the plan, this is where data is sent around to various nodes to split or merge the data when an operation is parallelized. -Jason Altekruse On Wed, Oct 22, 2014 at 9:04 AM, Chris Drawater <[email protected]> wrote: > We have now resolved our JDBC and ODBC connection issues …. > > It turns out that although throughout we used IP addresses in all our > configurations … somewhere down the line the IP Name of the DrillBit > started to be used (in red below)… > > > > 16:42:21.834 [main-SendThread(10.44.18.101:2181)] DEBUG > org.apache.zookeeper.ClientCnxn - Reading reply > sessionid:0x1493348d1c30005, packet:: clientPath:null serverPath:null > finishe > > d:false header:: 9,4 replyHeader:: 9,316,0 request:: > '/drill/drillbits1/1b8d393d-9802-4e0b-9115-b60e3154829a,F response:: > #a2431623864333933642d393830322d346530622d393131352d6236 > > > 3065333135343832396110fffffff1fffffff0ffffffa4ffffff9affffff93291a14a66472696c6c3110ffffffa2fffffff2118ffffffa3fffffff2120ffffffa4fffffff21,s{308,308,1413904677021,1413904677021,0,0 > > ,0,92661655187685376,67,0,308} > > 16:42:22.006 [main] DEBUG i.n.c.MultithreadEventLoopGroup - > -Dio.netty.eventLoopThreads: 8 > > 16:42:22.021 [main] DEBUG io.netty.channel.nio.NioEventLoop - > -Dio.netty.noKeySetOptimization: false > > 16:42:22.021 [main] DEBUG io.netty.channel.nio.NioEventLoop - > -Dio.netty.selectorAutoRebuildThreshold: 512 > > 16:42:22.053 [main] DEBUG o.a.drill.exec.client.DrillClient - Connecting > to server *drill1*:31010 > > 16:42:24.767 [main] DEBUG i.n.util.internal.ThreadLocalRandom - > -Dio.netty.initialSeedUniquifier: 0x2430146dd102a22c > > 16:42:24.783 [main] DEBUG i.n.channel.ChannelOutboundBuffer - > -Dio.netty.threadLocalDirectBufferSize: 65536 > > 16:42:24.783 [main] DEBUG io.netty.util.Recycler - > -Dio.netty.recycler.maxCapacity.default: 262144 > > 16:42:24.798 [main] DEBUG io.netty.buffer.ByteBufUtil - > -Dio.netty.allocator.type: unpooled > > 16:42:24.814 [Client-1] DEBUG > i.n.u.i.JavassistTypeParameterMatcherGenerator - Generated: > io.netty.util.internal.__matchers__.io.netty.buffer.ByteBufMatcher > > 16:42:24.814 [Client-1] DEBUG > i.n.u.i.JavassistTypeParameterMatcherGenerator - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.OutboundRpcMessageMatcher > > 16:42:24.829 [Client-1] DEBUG > i.n.u.i.JavassistTypeParameterMatcherGenerator - Generated: > io.netty.util.internal.__matchers__.org.apache.drill.exec.rpc.InboundRpcMessageMatcher > > 16:42:24.845 [Client-1] INFO o.a.drill.exec.rpc.user.UserClient - Channel > closed between local null and remote null > > SQLException : SQL state: null java.sql.SQLException: Failure while > attempting to connect to Drill. ErrorCode: 0 > > > > The resolution (for our test purposes) was to simply to pop an entry in > C:\Windows\System32\drivers\etc\hosts. > > > > But I’d still appreciate the answers to my questions à > > > > · Would we expect to be able to run a distr SQL query by > connecting via JDBC direct to specific drillbit ? *Seems Not ?* > > · Would we expect to be able to run a distr SQL query by > connecting via ODBC direct to specific drillbit ? > > · Would we expect to be able to run a distr SQL query by > connecting via JDBC via a zookeeper quorum connection ? > > · Would we expect to be able to run a distr SQL query by > connecting via ODBC via a zookeeper quorum connection ? > > · How can we identify the use of multiple nodes due to a > distributed SQL query via explain plan output or the JSON QEP ? > > > > Thanks, > > Chris > > > > > > *From:* Chris Drawater > *Sent:* 22 October 2014 10:39 > *To:* '[email protected]' > *Subject:* Distributed SQL on Drill question > > > > > > Hi, > > > > We have started to evaluate Apache Drill. > > > > Using multiple nodes/VMs we wish to stream JSON data (via Apache Storm) > to persistent local filesystem (with a consistent dir structure so we can > define various storage plugins) and then use Drill to run distributed SQL > queries across these JSON files. > > If at all possible we don't wish to install Hadoop HDFS/Hbase/Hive. > > > > > > SQL execution would be hopefully be via ODBC and/or JDBC. > > > > So far , embedded drill experiments work just great but the data processed > is local to the sqlline. > > Using SQL against JSON works really well. > > > > Unfortunately, experiments to run distributed SQL queries have (so far) > not been successful. > > We have tried > > · a 3 node VM based system with a 3 node zookeeper quorum + 1 > drillbit per node > > · a 3 node VM based system with a single zookeeper instance > (covering all 3 nodes) + again 1 drillbit per node > > and 'select * from sys.drillbits' from sqlline outputs all 3 drill bits in > both case. *Likewise, zookeeper confirms the existence of the drill > cluster.* > > But we’ve not managed to run a distributed query. > > > > We have tested against both the 0.5 release and a 0.6 build on 64 bit > Ubuntu 14.04. > > > > So far, we can only connect via ODBC to a specific drillbit. > > We cannot get the JDBC driver (using squirrel) to work. > > Also ODBC to a zookeeper quorum doesn't appear to work. * But we have > tested client access using telnet IP_ADDR 2181 and that's OK.* > > > > So my questions are : > > > > · Would we expect to be able to run a distr SQL query by > connecting via JDBC direct to specific drillbit ? > > · Would we expect to be able to run a distr SQL query by > connecting via ODBC direct to specific drillbit ? > > · Would we expect to be able to run a distr SQL query by > connecting via JDBC via a zookeeper quorum connection ? > > · Would we expect to be able to run a distr SQL query by > connecting via ODBC via a zookeeper quorum connection ? > > · How can we identify the use of multiple nodes due to a > distributed SQL query via explain plan output or the JSON QEP ? > > · Any ideas or issues why we can't connect via the a zookeeper > quorum connection ? > > > > Any help or insights you can give would be most appreciated. > > > > Thanks. > > Chris > > > > Chris Drawater > > Database Architect > > *[image: Description: AriesoA-JDSU-Mobility-Solution_logo 300px wide]* > <http://www.arieso.com/> > > Office +44 1635 232470 | Fax +44 1635 232471 > > Email [email protected] | Web www.arieso.com > > >
