I've done so more work on the BitComImpl so you should probably constrain your work to the Rpc base classes, User client and server and BitCom client and server.
J On Thu, Apr 25, 2013 at 9:10 PM, David Alves <[email protected]> wrote: > Hi Jacques > > I can take the RPC stuff. > Have you made any progress in Bit<>Bit comms? > > Best > David > > On Apr 25, 2013, at 11:06 PM, Jacques Nadeau <[email protected]> wrote: > > > I'm working on the execwork stuff and if someone would like to help out, > > here are a couple of things that need doing. I figured I'd drop them > here > > and see if anyone wants to work on them in the next couple of days. If > so, > > let me know otherwise I'll be picking them up soon. > > > > *RPC* > > - RPC Layer Handshakes: Currently, I haven't implemented the handshake > that > > should happen in either the User <> Bit or the Bit <> Bit layer. The > plan > > was to use an additional inserted event handler that removed itself from > > the event pipeline after a successful handshake or disconnected the > channel > > on a failed handshake (with appropriate logging). The main validation at > > this point will be simply confirming that both endpoints are running on > the > > same protocol version. The only other information that is currently > > needed is that that in the Bit <> Bit communication, the client should > > inform the server of its DrillEndpoint so that the server can then map > that > > for future communication in the other direction. > > > > *DataTypes* > > - General Expansion: Currently, we have a hodgepodge of datatypes within > > the org.apache.drill.common.expression.types.DataType. We need to clean > > this up. There should be types that map to standard sql types. My > > thinking is that we should actually have separate types for each for > > nullable, non-nullable and repeated (required, optional and repeated in > > protobuf vernaciular) since we'll generally operate with those values > > completely differently (and that each type should reveal which it is). > We > > should also have a relationship mapping from each to the other (e.g. how > to > > convert a signed 32 bit int into a nullable signed 32 bit int. > > > > - Map Types: We don't need nullable but we will need different map types: > > inline and fieldwise. I think these will useful for the execution engine > > and will be leverage depending on the particular needs-- for example > > fieldwise will be a natural fit where we're operating on columnar data > and > > doing an explode or other fieldwise nested operation and inline will be > > useful when we're doing things like sorting a complex field. Inline will > > also be appropriate where we have extremely sparse record sets. We'll > just > > need transformation methods between the two variations. In the case of a > > fieldwise map type field, the field is virtual and only exists to contain > > its child fields. > > > > - Non-static DataTypes: We have a need types that don't fit the static > data > > type model above. Examples include fixed width types (e.g. 10 byte > > string), polymorphic (inline encoded) types (number or string depending > on > > record) and repeated nested versions of our other types. These are a > > little more gnarly as we need to support canonicalization of these. > Optiq > > has some methods for how to handle this kind of type system so it > probably > > makes sense to leverage that system. > > > > *Expression Type Materialization* > > - LogicalExpression type materialization: Right now, LogicalExpressions > > include support for late type binding. As part of the record batch > > execution path, these need to get materialized with correct casting, etc > > based on the actual found schema. As such, we need to have a function > > which takes a LogicalExpression tree, applies a materialized BatchSchema > > and returns a new LogicalExpression tree with full type settings. As > part > > of this process, all types need to be cast as necessary and full > validation > > of the tree should be done. Timothy has a pending work for validation > > specifically on a pull request that would be a good piece of code to > > leverage that need. We also have a visitor model for the expression tree > > that should be able to aid in the updated LogicalExpression construction. > > -LogicalExpression to Java expression conversion: We need to be able to > > convert our logical expressions into Java code expressions. Initially, > > this should be done in a simplistic way, using something like implicit > > boxing and the like just to get something working. This will likely be > > specialized per major type (nullable, non-nullable and repeated) and a > > framework might the most sense actually just distinguishing the > > LogicalExpression by these types. > > > > *JDBC* > > - The Drill JDBC driver layer needs to be updated to leverage our > zookeeper > > coordination locations so that it can correctly find the cluster > location. > > - The Drill JDBC driver should also manage reconnects so that if it loses > > connection with a particular Drillbit partner, that it will reconnect to > > another available node in the cluster. > > - Someone should point SQuirreL at Julian's latest work and see how > things > > go... > > > > *ByteCode Engineering* > > - We need to put together a concrete class materialization strategy. My > > thinking for relational operators and code generation is that in most > > cases, we'll have an interface and a template class for a particular > > relational operator. We will build a template class that has all the > > generic stuff implemented but will make calls to empty methods where it > > expects lower level operations to occur. This allows things like the > > looping and certain types of null management to be fully materialized in > > source code without having to deal with the complexities of ByteCode > > generation. It also eases testing complexity. When a particular > > implementation is required, the Drillbit will be responsible for > generating > > updated method bodies as required for the record-level expressions, > marking > > all the methods and class as final, then loading the implementation into > > the query-level classloader. Note that the production Drillbit will > never > > load the template class into the JVM and will simply utilize it in > ByteCode > > form. I was hoping someone can take a look at trying to pull together a > > cohesive approach to doing this using ASM and Janino (likely utilizing > the > > JDK commons-compiler mode). The interface should be pretty simple: input > > is an interface, a template class name, a set of (method_signature, > > method_body_text) objects and a varargs of objects that are required for > > object instantiation. The return should be an instance of the interface. > > The interface should check things like method_signature provided to > > available method blocks, the method blocks being replaced are empty, the > > object constructor matches the set of object argument provided by the > > object instantiation request, etc. > > > > *ByteBuf Improvements* > > - Our BufferAllocator should support child allocators (getChild()) with > > their own memory maximums and accounting (so we can determine the memory > > overhead to particular queries). We also need to be able to release > entire > > child allocations at once. > > - We need to create a number of primitive type specific wrapping classes > > for ByteBuf. These additions include fixed offset indexing for > operations > > (e.g. index 1 of an int buffer should be at 4 bytes), adding support for > > unsigned values (my preference would be to leverage the work in Guava if > > that makes sense) and modifying the hard bounds checks to softer assert > > checks to increase production performance. While we could do this > > utilizing the ByteBuf interface, from everything I've experienced and > read, > > we need to minimize issues with inlining and performance so we really > need > > to be able to modify/refer to PooledUnsafeDirectByteBuf directly for the > > wrapping classes. Of course, it is a final package private class. Short > > term that means we really need to create a number of specific buffer > types > > that wrap it and just put them in the io.netty.buffer package (or > > alternatively create a Drill version or wrapper). > >
