Jacques I think this is the one I emailed you before that has no licensing info.
Tim Sent from my iPhone On Apr 26, 2013, at 9:30 AM, David Alves <[email protected]> wrote: > i've looked through it and looks like it can leverage shared memory, which I > was looking for anyway. > I also like the way garbage collection works (gc in java also clears > off-heap). > I'll take a deeper look during the weekend. > > -david > > On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]> wrote: > >> I've looked at that in the past and think the idea of using here is very >> good. It seems like ByteBuf is nice as it has things like endianess >> capabilities, reference counting and management and Netty direct support. >> On the flipside, larray is nice for its large array capabilities and >> better input/output interfaces. The best approach might be to define a new >> ByteBuf implementation that leverages LArray. I'll take a look at this in >> a few days if someone else doesn't want to. >> >> j >> >> On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]> wrote: >> >>> Fort *ByteBuf Improvements*, Have you looked at LArrayJ >>> https://github.com/xerial/larray. It has those wrappers and I found it >>> quite useful. The same person has also written java version for snappy >>> compression. Not sure if you guys have plan to add compression, but one of >>> the nice things I could do was use the memory offsets for source(compressed >>> data) and dest(uncompressed array) and do the decompression off-heap. It >>> supports the need for looking up by index and has wrappers for most of the >>> primitive data types. >>> >>> Are you looking at something like this? >>> >>> thanks, >>> Kishore G >>> >>> >>> >>> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau <[email protected]> >>> wrote: >>> >>>> They are on the list but the list is long :) >>>> >>>> Have a good weekend. >>>> >>>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]> wrote: >>>> >>>>> So if no one picks anything up you will be done with all the work in >>> the >>>>> next couple of days? :) >>>>> >>>>> Would like to help out but I'm traveling to la over the weekend. >>>>> >>>>> I'll sync with you Monday to see how I can help then. >>>>> >>>>> Tim >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]> >>> wrote: >>>>> >>>>>> I'm working on the execwork stuff and if someone would like to help >>>> out, >>>>>> here are a couple of things that need doing. I figured I'd drop them >>>>> here >>>>>> and see if anyone wants to work on them in the next couple of days. >>> If >>>>> so, >>>>>> let me know otherwise I'll be picking them up soon. >>>>>> >>>>>> *RPC* >>>>>> - RPC Layer Handshakes: Currently, I haven't implemented the >>> handshake >>>>> that >>>>>> should happen in either the User <> Bit or the Bit <> Bit layer. The >>>>> plan >>>>>> was to use an additional inserted event handler that removed itself >>>> from >>>>>> the event pipeline after a successful handshake or disconnected the >>>>> channel >>>>>> on a failed handshake (with appropriate logging). The main >>> validation >>>> at >>>>>> this point will be simply confirming that both endpoints are running >>> on >>>>> the >>>>>> same protocol version. The only other information that is currently >>>>>> needed is that that in the Bit <> Bit communication, the client >>> should >>>>>> inform the server of its DrillEndpoint so that the server can then >>> map >>>>> that >>>>>> for future communication in the other direction. >>>>>> >>>>>> *DataTypes* >>>>>> - General Expansion: Currently, we have a hodgepodge of datatypes >>>> within >>>>>> the org.apache.drill.common.expression.types.DataType. We need to >>>> clean >>>>>> this up. There should be types that map to standard sql types. My >>>>>> thinking is that we should actually have separate types for each for >>>>>> nullable, non-nullable and repeated (required, optional and repeated >>> in >>>>>> protobuf vernaciular) since we'll generally operate with those values >>>>>> completely differently (and that each type should reveal which it >>> is). >>>>> We >>>>>> should also have a relationship mapping from each to the other (e.g. >>>> how >>>>> to >>>>>> convert a signed 32 bit int into a nullable signed 32 bit int. >>>>>> >>>>>> - Map Types: We don't need nullable but we will need different map >>>> types: >>>>>> inline and fieldwise. I think these will useful for the execution >>>> engine >>>>>> and will be leverage depending on the particular needs-- for example >>>>>> fieldwise will be a natural fit where we're operating on columnar >>> data >>>>> and >>>>>> doing an explode or other fieldwise nested operation and inline will >>> be >>>>>> useful when we're doing things like sorting a complex field. Inline >>>> will >>>>>> also be appropriate where we have extremely sparse record sets. >>> We'll >>>>> just >>>>>> need transformation methods between the two variations. In the case >>>> of a >>>>>> fieldwise map type field, the field is virtual and only exists to >>>> contain >>>>>> its child fields. >>>>>> >>>>>> - Non-static DataTypes: We have a need types that don't fit the >>> static >>>>> data >>>>>> type model above. Examples include fixed width types (e.g. 10 byte >>>>>> string), polymorphic (inline encoded) types (number or string >>> depending >>>>> on >>>>>> record) and repeated nested versions of our other types. These are a >>>>>> little more gnarly as we need to support canonicalization of these. >>>>> Optiq >>>>>> has some methods for how to handle this kind of type system so it >>>>> probably >>>>>> makes sense to leverage that system. >>>>>> >>>>>> *Expression Type Materialization* >>>>>> - LogicalExpression type materialization: Right now, >>> LogicalExpressions >>>>>> include support for late type binding. As part of the record batch >>>>>> execution path, these need to get materialized with correct casting, >>>> etc >>>>>> based on the actual found schema. As such, we need to have a >>> function >>>>>> which takes a LogicalExpression tree, applies a materialized >>>> BatchSchema >>>>>> and returns a new LogicalExpression tree with full type settings. As >>>>> part >>>>>> of this process, all types need to be cast as necessary and full >>>>> validation >>>>>> of the tree should be done. Timothy has a pending work for >>> validation >>>>>> specifically on a pull request that would be a good piece of code to >>>>>> leverage that need. We also have a visitor model for the expression >>>> tree >>>>>> that should be able to aid in the updated LogicalExpression >>>> construction. >>>>>> -LogicalExpression to Java expression conversion: We need to be able >>> to >>>>>> convert our logical expressions into Java code expressions. >>> Initially, >>>>>> this should be done in a simplistic way, using something like >>> implicit >>>>>> boxing and the like just to get something working. This will likely >>> be >>>>>> specialized per major type (nullable, non-nullable and repeated) and >>> a >>>>>> framework might the most sense actually just distinguishing the >>>>>> LogicalExpression by these types. >>>>>> >>>>>> *JDBC* >>>>>> - The Drill JDBC driver layer needs to be updated to leverage our >>>>> zookeeper >>>>>> coordination locations so that it can correctly find the cluster >>>>> location. >>>>>> - The Drill JDBC driver should also manage reconnects so that if it >>>> loses >>>>>> connection with a particular Drillbit partner, that it will reconnect >>>> to >>>>>> another available node in the cluster. >>>>>> - Someone should point SQuirreL at Julian's latest work and see how >>>>> things >>>>>> go... >>>>>> >>>>>> *ByteCode Engineering* >>>>>> - We need to put together a concrete class materialization strategy. >>>> My >>>>>> thinking for relational operators and code generation is that in most >>>>>> cases, we'll have an interface and a template class for a particular >>>>>> relational operator. We will build a template class that has all the >>>>>> generic stuff implemented but will make calls to empty methods where >>> it >>>>>> expects lower level operations to occur. This allows things like the >>>>>> looping and certain types of null management to be fully materialized >>>> in >>>>>> source code without having to deal with the complexities of ByteCode >>>>>> generation. It also eases testing complexity. When a particular >>>>>> implementation is required, the Drillbit will be responsible for >>>>> generating >>>>>> updated method bodies as required for the record-level expressions, >>>>> marking >>>>>> all the methods and class as final, then loading the implementation >>>> into >>>>>> the query-level classloader. Note that the production Drillbit will >>>>> never >>>>>> load the template class into the JVM and will simply utilize it in >>>>> ByteCode >>>>>> form. I was hoping someone can take a look at trying to pull >>> together >>>> a >>>>>> cohesive approach to doing this using ASM and Janino (likely >>> utilizing >>>>> the >>>>>> JDK commons-compiler mode). The interface should be pretty simple: >>>> input >>>>>> is an interface, a template class name, a set of (method_signature, >>>>>> method_body_text) objects and a varargs of objects that are required >>>> for >>>>>> object instantiation. The return should be an instance of the >>>> interface. >>>>>> The interface should check things like method_signature provided to >>>>>> available method blocks, the method blocks being replaced are empty, >>>> the >>>>>> object constructor matches the set of object argument provided by the >>>>>> object instantiation request, etc. >>>>>> >>>>>> *ByteBuf Improvements* >>>>>> - Our BufferAllocator should support child allocators (getChild()) >>> with >>>>>> their own memory maximums and accounting (so we can determine the >>>> memory >>>>>> overhead to particular queries). We also need to be able to release >>>>> entire >>>>>> child allocations at once. >>>>>> - We need to create a number of primitive type specific wrapping >>>> classes >>>>>> for ByteBuf. These additions include fixed offset indexing for >>>>> operations >>>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding support >>>> for >>>>>> unsigned values (my preference would be to leverage the work in Guava >>>> if >>>>>> that makes sense) and modifying the hard bounds checks to softer >>> assert >>>>>> checks to increase production performance. While we could do this >>>>>> utilizing the ByteBuf interface, from everything I've experienced and >>>>> read, >>>>>> we need to minimize issues with inlining and performance so we really >>>>> need >>>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly for >>>> the >>>>>> wrapping classes. Of course, it is a final package private class. >>>> Short >>>>>> term that means we really need to create a number of specific buffer >>>>> types >>>>>> that wrap it and just put them in the io.netty.buffer package (or >>>>>> alternatively create a Drill version or wrapper). >
