good point, i'll try and ask the author. it's a pretty recent lib so that might be an oversight…
-david On Apr 26, 2013, at 12:04 PM, Timothy Chen <[email protected]> wrote: > Jacques I think this is the one I emailed you before that has no licensing > info. > > Tim > > Sent from my iPhone > > On Apr 26, 2013, at 9:30 AM, David Alves <[email protected]> wrote: > >> i've looked through it and looks like it can leverage shared memory, which I >> was looking for anyway. >> I also like the way garbage collection works (gc in java also clears >> off-heap). >> I'll take a deeper look during the weekend. >> >> -david >> >> On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]> wrote: >> >>> I've looked at that in the past and think the idea of using here is very >>> good. It seems like ByteBuf is nice as it has things like endianess >>> capabilities, reference counting and management and Netty direct support. >>> On the flipside, larray is nice for its large array capabilities and >>> better input/output interfaces. The best approach might be to define a new >>> ByteBuf implementation that leverages LArray. I'll take a look at this in >>> a few days if someone else doesn't want to. >>> >>> j >>> >>> On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]> wrote: >>> >>>> Fort *ByteBuf Improvements*, Have you looked at LArrayJ >>>> https://github.com/xerial/larray. It has those wrappers and I found it >>>> quite useful. The same person has also written java version for snappy >>>> compression. Not sure if you guys have plan to add compression, but one of >>>> the nice things I could do was use the memory offsets for source(compressed >>>> data) and dest(uncompressed array) and do the decompression off-heap. It >>>> supports the need for looking up by index and has wrappers for most of the >>>> primitive data types. >>>> >>>> Are you looking at something like this? >>>> >>>> thanks, >>>> Kishore G >>>> >>>> >>>> >>>> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau <[email protected]> >>>> wrote: >>>> >>>>> They are on the list but the list is long :) >>>>> >>>>> Have a good weekend. >>>>> >>>>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]> wrote: >>>>> >>>>>> So if no one picks anything up you will be done with all the work in >>>> the >>>>>> next couple of days? :) >>>>>> >>>>>> Would like to help out but I'm traveling to la over the weekend. >>>>>> >>>>>> I'll sync with you Monday to see how I can help then. >>>>>> >>>>>> Tim >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]> >>>> wrote: >>>>>> >>>>>>> I'm working on the execwork stuff and if someone would like to help >>>>> out, >>>>>>> here are a couple of things that need doing. I figured I'd drop them >>>>>> here >>>>>>> and see if anyone wants to work on them in the next couple of days. >>>> If >>>>>> so, >>>>>>> let me know otherwise I'll be picking them up soon. >>>>>>> >>>>>>> *RPC* >>>>>>> - RPC Layer Handshakes: Currently, I haven't implemented the >>>> handshake >>>>>> that >>>>>>> should happen in either the User <> Bit or the Bit <> Bit layer. The >>>>>> plan >>>>>>> was to use an additional inserted event handler that removed itself >>>>> from >>>>>>> the event pipeline after a successful handshake or disconnected the >>>>>> channel >>>>>>> on a failed handshake (with appropriate logging). The main >>>> validation >>>>> at >>>>>>> this point will be simply confirming that both endpoints are running >>>> on >>>>>> the >>>>>>> same protocol version. The only other information that is currently >>>>>>> needed is that that in the Bit <> Bit communication, the client >>>> should >>>>>>> inform the server of its DrillEndpoint so that the server can then >>>> map >>>>>> that >>>>>>> for future communication in the other direction. >>>>>>> >>>>>>> *DataTypes* >>>>>>> - General Expansion: Currently, we have a hodgepodge of datatypes >>>>> within >>>>>>> the org.apache.drill.common.expression.types.DataType. We need to >>>>> clean >>>>>>> this up. There should be types that map to standard sql types. My >>>>>>> thinking is that we should actually have separate types for each for >>>>>>> nullable, non-nullable and repeated (required, optional and repeated >>>> in >>>>>>> protobuf vernaciular) since we'll generally operate with those values >>>>>>> completely differently (and that each type should reveal which it >>>> is). >>>>>> We >>>>>>> should also have a relationship mapping from each to the other (e.g. >>>>> how >>>>>> to >>>>>>> convert a signed 32 bit int into a nullable signed 32 bit int. >>>>>>> >>>>>>> - Map Types: We don't need nullable but we will need different map >>>>> types: >>>>>>> inline and fieldwise. I think these will useful for the execution >>>>> engine >>>>>>> and will be leverage depending on the particular needs-- for example >>>>>>> fieldwise will be a natural fit where we're operating on columnar >>>> data >>>>>> and >>>>>>> doing an explode or other fieldwise nested operation and inline will >>>> be >>>>>>> useful when we're doing things like sorting a complex field. Inline >>>>> will >>>>>>> also be appropriate where we have extremely sparse record sets. >>>> We'll >>>>>> just >>>>>>> need transformation methods between the two variations. In the case >>>>> of a >>>>>>> fieldwise map type field, the field is virtual and only exists to >>>>> contain >>>>>>> its child fields. >>>>>>> >>>>>>> - Non-static DataTypes: We have a need types that don't fit the >>>> static >>>>>> data >>>>>>> type model above. Examples include fixed width types (e.g. 10 byte >>>>>>> string), polymorphic (inline encoded) types (number or string >>>> depending >>>>>> on >>>>>>> record) and repeated nested versions of our other types. These are a >>>>>>> little more gnarly as we need to support canonicalization of these. >>>>>> Optiq >>>>>>> has some methods for how to handle this kind of type system so it >>>>>> probably >>>>>>> makes sense to leverage that system. >>>>>>> >>>>>>> *Expression Type Materialization* >>>>>>> - LogicalExpression type materialization: Right now, >>>> LogicalExpressions >>>>>>> include support for late type binding. As part of the record batch >>>>>>> execution path, these need to get materialized with correct casting, >>>>> etc >>>>>>> based on the actual found schema. As such, we need to have a >>>> function >>>>>>> which takes a LogicalExpression tree, applies a materialized >>>>> BatchSchema >>>>>>> and returns a new LogicalExpression tree with full type settings. As >>>>>> part >>>>>>> of this process, all types need to be cast as necessary and full >>>>>> validation >>>>>>> of the tree should be done. Timothy has a pending work for >>>> validation >>>>>>> specifically on a pull request that would be a good piece of code to >>>>>>> leverage that need. We also have a visitor model for the expression >>>>> tree >>>>>>> that should be able to aid in the updated LogicalExpression >>>>> construction. >>>>>>> -LogicalExpression to Java expression conversion: We need to be able >>>> to >>>>>>> convert our logical expressions into Java code expressions. >>>> Initially, >>>>>>> this should be done in a simplistic way, using something like >>>> implicit >>>>>>> boxing and the like just to get something working. This will likely >>>> be >>>>>>> specialized per major type (nullable, non-nullable and repeated) and >>>> a >>>>>>> framework might the most sense actually just distinguishing the >>>>>>> LogicalExpression by these types. >>>>>>> >>>>>>> *JDBC* >>>>>>> - The Drill JDBC driver layer needs to be updated to leverage our >>>>>> zookeeper >>>>>>> coordination locations so that it can correctly find the cluster >>>>>> location. >>>>>>> - The Drill JDBC driver should also manage reconnects so that if it >>>>> loses >>>>>>> connection with a particular Drillbit partner, that it will reconnect >>>>> to >>>>>>> another available node in the cluster. >>>>>>> - Someone should point SQuirreL at Julian's latest work and see how >>>>>> things >>>>>>> go... >>>>>>> >>>>>>> *ByteCode Engineering* >>>>>>> - We need to put together a concrete class materialization strategy. >>>>> My >>>>>>> thinking for relational operators and code generation is that in most >>>>>>> cases, we'll have an interface and a template class for a particular >>>>>>> relational operator. We will build a template class that has all the >>>>>>> generic stuff implemented but will make calls to empty methods where >>>> it >>>>>>> expects lower level operations to occur. This allows things like the >>>>>>> looping and certain types of null management to be fully materialized >>>>> in >>>>>>> source code without having to deal with the complexities of ByteCode >>>>>>> generation. It also eases testing complexity. When a particular >>>>>>> implementation is required, the Drillbit will be responsible for >>>>>> generating >>>>>>> updated method bodies as required for the record-level expressions, >>>>>> marking >>>>>>> all the methods and class as final, then loading the implementation >>>>> into >>>>>>> the query-level classloader. Note that the production Drillbit will >>>>>> never >>>>>>> load the template class into the JVM and will simply utilize it in >>>>>> ByteCode >>>>>>> form. I was hoping someone can take a look at trying to pull >>>> together >>>>> a >>>>>>> cohesive approach to doing this using ASM and Janino (likely >>>> utilizing >>>>>> the >>>>>>> JDK commons-compiler mode). The interface should be pretty simple: >>>>> input >>>>>>> is an interface, a template class name, a set of (method_signature, >>>>>>> method_body_text) objects and a varargs of objects that are required >>>>> for >>>>>>> object instantiation. The return should be an instance of the >>>>> interface. >>>>>>> The interface should check things like method_signature provided to >>>>>>> available method blocks, the method blocks being replaced are empty, >>>>> the >>>>>>> object constructor matches the set of object argument provided by the >>>>>>> object instantiation request, etc. >>>>>>> >>>>>>> *ByteBuf Improvements* >>>>>>> - Our BufferAllocator should support child allocators (getChild()) >>>> with >>>>>>> their own memory maximums and accounting (so we can determine the >>>>> memory >>>>>>> overhead to particular queries). We also need to be able to release >>>>>> entire >>>>>>> child allocations at once. >>>>>>> - We need to create a number of primitive type specific wrapping >>>>> classes >>>>>>> for ByteBuf. These additions include fixed offset indexing for >>>>>> operations >>>>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding support >>>>> for >>>>>>> unsigned values (my preference would be to leverage the work in Guava >>>>> if >>>>>>> that makes sense) and modifying the hard bounds checks to softer >>>> assert >>>>>>> checks to increase production performance. While we could do this >>>>>>> utilizing the ByteBuf interface, from everything I've experienced and >>>>>> read, >>>>>>> we need to minimize issues with inlining and performance so we really >>>>>> need >>>>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly for >>>>> the >>>>>>> wrapping classes. Of course, it is a final package private class. >>>>> Short >>>>>>> term that means we really need to create a number of specific buffer >>>>>> types >>>>>>> that wrap it and just put them in the io.netty.buffer package (or >>>>>>> alternatively create a Drill version or wrapper). >>
