i've looked through it and looks like it can leverage shared memory, which I was looking for anyway. I also like the way garbage collection works (gc in java also clears off-heap). I'll take a deeper look during the weekend.
-david On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]> wrote: > I've looked at that in the past and think the idea of using here is very > good. It seems like ByteBuf is nice as it has things like endianess > capabilities, reference counting and management and Netty direct support. > On the flipside, larray is nice for its large array capabilities and > better input/output interfaces. The best approach might be to define a new > ByteBuf implementation that leverages LArray. I'll take a look at this in > a few days if someone else doesn't want to. > > j > > On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]> wrote: > >> Fort *ByteBuf Improvements*, Have you looked at LArrayJ >> https://github.com/xerial/larray. It has those wrappers and I found it >> quite useful. The same person has also written java version for snappy >> compression. Not sure if you guys have plan to add compression, but one of >> the nice things I could do was use the memory offsets for source(compressed >> data) and dest(uncompressed array) and do the decompression off-heap. It >> supports the need for looking up by index and has wrappers for most of the >> primitive data types. >> >> Are you looking at something like this? >> >> thanks, >> Kishore G >> >> >> >> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau <[email protected]> >> wrote: >> >>> They are on the list but the list is long :) >>> >>> Have a good weekend. >>> >>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]> wrote: >>> >>>> So if no one picks anything up you will be done with all the work in >> the >>>> next couple of days? :) >>>> >>>> Would like to help out but I'm traveling to la over the weekend. >>>> >>>> I'll sync with you Monday to see how I can help then. >>>> >>>> Tim >>>> >>>> Sent from my iPhone >>>> >>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]> >> wrote: >>>> >>>>> I'm working on the execwork stuff and if someone would like to help >>> out, >>>>> here are a couple of things that need doing. I figured I'd drop them >>>> here >>>>> and see if anyone wants to work on them in the next couple of days. >> If >>>> so, >>>>> let me know otherwise I'll be picking them up soon. >>>>> >>>>> *RPC* >>>>> - RPC Layer Handshakes: Currently, I haven't implemented the >> handshake >>>> that >>>>> should happen in either the User <> Bit or the Bit <> Bit layer. The >>>> plan >>>>> was to use an additional inserted event handler that removed itself >>> from >>>>> the event pipeline after a successful handshake or disconnected the >>>> channel >>>>> on a failed handshake (with appropriate logging). The main >> validation >>> at >>>>> this point will be simply confirming that both endpoints are running >> on >>>> the >>>>> same protocol version. The only other information that is currently >>>>> needed is that that in the Bit <> Bit communication, the client >> should >>>>> inform the server of its DrillEndpoint so that the server can then >> map >>>> that >>>>> for future communication in the other direction. >>>>> >>>>> *DataTypes* >>>>> - General Expansion: Currently, we have a hodgepodge of datatypes >>> within >>>>> the org.apache.drill.common.expression.types.DataType. We need to >>> clean >>>>> this up. There should be types that map to standard sql types. My >>>>> thinking is that we should actually have separate types for each for >>>>> nullable, non-nullable and repeated (required, optional and repeated >> in >>>>> protobuf vernaciular) since we'll generally operate with those values >>>>> completely differently (and that each type should reveal which it >> is). >>>> We >>>>> should also have a relationship mapping from each to the other (e.g. >>> how >>>> to >>>>> convert a signed 32 bit int into a nullable signed 32 bit int. >>>>> >>>>> - Map Types: We don't need nullable but we will need different map >>> types: >>>>> inline and fieldwise. I think these will useful for the execution >>> engine >>>>> and will be leverage depending on the particular needs-- for example >>>>> fieldwise will be a natural fit where we're operating on columnar >> data >>>> and >>>>> doing an explode or other fieldwise nested operation and inline will >> be >>>>> useful when we're doing things like sorting a complex field. Inline >>> will >>>>> also be appropriate where we have extremely sparse record sets. >> We'll >>>> just >>>>> need transformation methods between the two variations. In the case >>> of a >>>>> fieldwise map type field, the field is virtual and only exists to >>> contain >>>>> its child fields. >>>>> >>>>> - Non-static DataTypes: We have a need types that don't fit the >> static >>>> data >>>>> type model above. Examples include fixed width types (e.g. 10 byte >>>>> string), polymorphic (inline encoded) types (number or string >> depending >>>> on >>>>> record) and repeated nested versions of our other types. These are a >>>>> little more gnarly as we need to support canonicalization of these. >>>> Optiq >>>>> has some methods for how to handle this kind of type system so it >>>> probably >>>>> makes sense to leverage that system. >>>>> >>>>> *Expression Type Materialization* >>>>> - LogicalExpression type materialization: Right now, >> LogicalExpressions >>>>> include support for late type binding. As part of the record batch >>>>> execution path, these need to get materialized with correct casting, >>> etc >>>>> based on the actual found schema. As such, we need to have a >> function >>>>> which takes a LogicalExpression tree, applies a materialized >>> BatchSchema >>>>> and returns a new LogicalExpression tree with full type settings. As >>>> part >>>>> of this process, all types need to be cast as necessary and full >>>> validation >>>>> of the tree should be done. Timothy has a pending work for >> validation >>>>> specifically on a pull request that would be a good piece of code to >>>>> leverage that need. We also have a visitor model for the expression >>> tree >>>>> that should be able to aid in the updated LogicalExpression >>> construction. >>>>> -LogicalExpression to Java expression conversion: We need to be able >> to >>>>> convert our logical expressions into Java code expressions. >> Initially, >>>>> this should be done in a simplistic way, using something like >> implicit >>>>> boxing and the like just to get something working. This will likely >> be >>>>> specialized per major type (nullable, non-nullable and repeated) and >> a >>>>> framework might the most sense actually just distinguishing the >>>>> LogicalExpression by these types. >>>>> >>>>> *JDBC* >>>>> - The Drill JDBC driver layer needs to be updated to leverage our >>>> zookeeper >>>>> coordination locations so that it can correctly find the cluster >>>> location. >>>>> - The Drill JDBC driver should also manage reconnects so that if it >>> loses >>>>> connection with a particular Drillbit partner, that it will reconnect >>> to >>>>> another available node in the cluster. >>>>> - Someone should point SQuirreL at Julian's latest work and see how >>>> things >>>>> go... >>>>> >>>>> *ByteCode Engineering* >>>>> - We need to put together a concrete class materialization strategy. >>> My >>>>> thinking for relational operators and code generation is that in most >>>>> cases, we'll have an interface and a template class for a particular >>>>> relational operator. We will build a template class that has all the >>>>> generic stuff implemented but will make calls to empty methods where >> it >>>>> expects lower level operations to occur. This allows things like the >>>>> looping and certain types of null management to be fully materialized >>> in >>>>> source code without having to deal with the complexities of ByteCode >>>>> generation. It also eases testing complexity. When a particular >>>>> implementation is required, the Drillbit will be responsible for >>>> generating >>>>> updated method bodies as required for the record-level expressions, >>>> marking >>>>> all the methods and class as final, then loading the implementation >>> into >>>>> the query-level classloader. Note that the production Drillbit will >>>> never >>>>> load the template class into the JVM and will simply utilize it in >>>> ByteCode >>>>> form. I was hoping someone can take a look at trying to pull >> together >>> a >>>>> cohesive approach to doing this using ASM and Janino (likely >> utilizing >>>> the >>>>> JDK commons-compiler mode). The interface should be pretty simple: >>> input >>>>> is an interface, a template class name, a set of (method_signature, >>>>> method_body_text) objects and a varargs of objects that are required >>> for >>>>> object instantiation. The return should be an instance of the >>> interface. >>>>> The interface should check things like method_signature provided to >>>>> available method blocks, the method blocks being replaced are empty, >>> the >>>>> object constructor matches the set of object argument provided by the >>>>> object instantiation request, etc. >>>>> >>>>> *ByteBuf Improvements* >>>>> - Our BufferAllocator should support child allocators (getChild()) >> with >>>>> their own memory maximums and accounting (so we can determine the >>> memory >>>>> overhead to particular queries). We also need to be able to release >>>> entire >>>>> child allocations at once. >>>>> - We need to create a number of primitive type specific wrapping >>> classes >>>>> for ByteBuf. These additions include fixed offset indexing for >>>> operations >>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding support >>> for >>>>> unsigned values (my preference would be to leverage the work in Guava >>> if >>>>> that makes sense) and modifying the hard bounds checks to softer >> assert >>>>> checks to increase production performance. While we could do this >>>>> utilizing the ByteBuf interface, from everything I've experienced and >>>> read, >>>>> we need to minimize issues with inlining and performance so we really >>>> need >>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly for >>> the >>>>> wrapping classes. Of course, it is a final package private class. >>> Short >>>>> term that means we really need to create a number of specific buffer >>>> types >>>>> that wrap it and just put them in the io.netty.buffer package (or >>>>> alternatively create a Drill version or wrapper). >>>> >>> >>
