That's what we were thinking. This would make the only work that needs to be done is to implement a mapping between Enumerable and value vectors for getting the data into Drill.
Linfeng, does this sound like something you would be interested in trying to contribute to Drill? -Jason On Thu, Jan 15, 2015 at 10:59 AM, Julian Hyde <[email protected]> wrote: > Calcite has a JDBC adapter, so it can convert parts of the query plan to > SQL and execute it in the JDBC source. Generally you want to push down as > much as possible. It works with MySQL and pretty much any JDBC data source. > > Drill uses Calcite internally, so if Drill exposed Calcite adapters as > data sources this problem would be solved. > > Julian > > > On Jan 15, 2015, at 10:30 AM, Jason Altekruse <[email protected]> > wrote: > > > > Hello Linfeng! > > > > Welcome to the Drill community! > > > > Currently drill does not support querying traditional databases like > MySQL, > > but it is a feature we have discussed adding for some time. If you are > > interested in trying to add your own, you can start by taking a look at > > some of the existing storage plugins. For a basic overview of the process > > of populating the in memory data structure for records in Drill, I would > > recommend reading through the Json reader implementation. While the in > > memory format for Drill is columnar, to allow for various execution > > optimizations, a simple row-by-row interface is provided for writing into > > the data structure. > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java > > > > Another important aspect of creating a storage plugin is leveraging the > > underlying systems capabilities to avoid reading an entire dataset for > each > > query. We have a system in place for pushing down filters and selected > > columns to the underlying storage layer. This is not necessary to be able > > to run queries in Drill, as you can simply read all of the data out of > the > > storage system and let Drill filter it out as necessary, but for many > > workloads this will obviously be sub-optimal in many cases ( just about > > anything but a select *). For an example of some of this filter rewriting > > you can see the work done on the Mongo plugin. > > > > > https://github.com/apache/drill/tree/master/contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo > > > > An important thing to note here is that the filters have to be rewritten > > from Drill convention to Mongo convention. In the case of a MySQL plugin, > > the storage system really could perform any sql operations that require > > only the tables within your MySQL instance and likely further optimize > > those subsets of the queries. To implement this functionality, we could > go > > far beyond what we have implemented with other storage plugins in terms > of > > pushdown, which requires more complex rewriting. Thankfully a dependency > > that Drill is already leveraging for its planning, calcite, has the > > capabilities to do these types of more complex pushdowns already. It uses > > an Enumerable interface (a fancy iterator) to expose data from one of its > > storage engines, such as a relational database, calcite includes a > simpler > > single node execution engine that can actually act much like Drill for > > smaller workloads. Likely the best first step for getting MySQL working > > would be to write a Drill storage plugin that takes an Enumerable as > input, > > then we should be able to leverage all of calcite's existing > functionality > > for doing the query rewrites and pushdowns relatively easily. > > > > -Jason Altekruse > > > > On Wed, Jan 7, 2015 at 7:39 PM, [email protected] < > > [email protected]> wrote: > > > >> 你好: > >> 请问drill目前是否支持mysql > >> 如果我自己想开发一套连接mysql的插件我该怎么做? > >> > >> Hello: > >> Does the drill is currently supports mysql > >> If I want to develop a set of connection mysql plugin what should I do? > >> > >> > >> > >> > >> [email protected] > >> > >
