Ya, just bringing that up again that. Doubt it will be a blocker. Tim
On Fri, Apr 26, 2013 at 10:12 AM, David Alves <[email protected]> wrote: > good point, i'll try and ask the author. > it's a pretty recent lib so that might be an oversight… > > -david > > On Apr 26, 2013, at 12:04 PM, Timothy Chen <[email protected]> wrote: > > > Jacques I think this is the one I emailed you before that has no > licensing info. > > > > Tim > > > > Sent from my iPhone > > > > On Apr 26, 2013, at 9:30 AM, David Alves <[email protected]> wrote: > > > >> i've looked through it and looks like it can leverage shared memory, > which I was looking for anyway. > >> I also like the way garbage collection works (gc in java also clears > off-heap). > >> I'll take a deeper look during the weekend. > >> > >> -david > >> > >> On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]> > wrote: > >> > >>> I've looked at that in the past and think the idea of using here is > very > >>> good. It seems like ByteBuf is nice as it has things like endianess > >>> capabilities, reference counting and management and Netty direct > support. > >>> On the flipside, larray is nice for its large array capabilities and > >>> better input/output interfaces. The best approach might be to define > a new > >>> ByteBuf implementation that leverages LArray. I'll take a look at > this in > >>> a few days if someone else doesn't want to. > >>> > >>> j > >>> > >>> On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]> > wrote: > >>> > >>>> Fort *ByteBuf Improvements*, Have you looked at LArrayJ > >>>> https://github.com/xerial/larray. It has those wrappers and I found > it > >>>> quite useful. The same person has also written java version for snappy > >>>> compression. Not sure if you guys have plan to add compression, but > one of > >>>> the nice things I could do was use the memory offsets for > source(compressed > >>>> data) and dest(uncompressed array) and do the decompression off-heap. > It > >>>> supports the need for looking up by index and has wrappers for most > of the > >>>> primitive data types. > >>>> > >>>> Are you looking at something like this? > >>>> > >>>> thanks, > >>>> Kishore G > >>>> > >>>> > >>>> > >>>> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau <[email protected]> > >>>> wrote: > >>>> > >>>>> They are on the list but the list is long :) > >>>>> > >>>>> Have a good weekend. > >>>>> > >>>>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]> > wrote: > >>>>> > >>>>>> So if no one picks anything up you will be done with all the work in > >>>> the > >>>>>> next couple of days? :) > >>>>>> > >>>>>> Would like to help out but I'm traveling to la over the weekend. > >>>>>> > >>>>>> I'll sync with you Monday to see how I can help then. > >>>>>> > >>>>>> Tim > >>>>>> > >>>>>> Sent from my iPhone > >>>>>> > >>>>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]> > >>>> wrote: > >>>>>> > >>>>>>> I'm working on the execwork stuff and if someone would like to help > >>>>> out, > >>>>>>> here are a couple of things that need doing. I figured I'd drop > them > >>>>>> here > >>>>>>> and see if anyone wants to work on them in the next couple of days. > >>>> If > >>>>>> so, > >>>>>>> let me know otherwise I'll be picking them up soon. > >>>>>>> > >>>>>>> *RPC* > >>>>>>> - RPC Layer Handshakes: Currently, I haven't implemented the > >>>> handshake > >>>>>> that > >>>>>>> should happen in either the User <> Bit or the Bit <> Bit layer. > The > >>>>>> plan > >>>>>>> was to use an additional inserted event handler that removed itself > >>>>> from > >>>>>>> the event pipeline after a successful handshake or disconnected the > >>>>>> channel > >>>>>>> on a failed handshake (with appropriate logging). The main > >>>> validation > >>>>> at > >>>>>>> this point will be simply confirming that both endpoints are > running > >>>> on > >>>>>> the > >>>>>>> same protocol version. The only other information that is > currently > >>>>>>> needed is that that in the Bit <> Bit communication, the client > >>>> should > >>>>>>> inform the server of its DrillEndpoint so that the server can then > >>>> map > >>>>>> that > >>>>>>> for future communication in the other direction. > >>>>>>> > >>>>>>> *DataTypes* > >>>>>>> - General Expansion: Currently, we have a hodgepodge of datatypes > >>>>> within > >>>>>>> the org.apache.drill.common.expression.types.DataType. We need to > >>>>> clean > >>>>>>> this up. There should be types that map to standard sql types. My > >>>>>>> thinking is that we should actually have separate types for each > for > >>>>>>> nullable, non-nullable and repeated (required, optional and > repeated > >>>> in > >>>>>>> protobuf vernaciular) since we'll generally operate with those > values > >>>>>>> completely differently (and that each type should reveal which it > >>>> is). > >>>>>> We > >>>>>>> should also have a relationship mapping from each to the other > (e.g. > >>>>> how > >>>>>> to > >>>>>>> convert a signed 32 bit int into a nullable signed 32 bit int. > >>>>>>> > >>>>>>> - Map Types: We don't need nullable but we will need different map > >>>>> types: > >>>>>>> inline and fieldwise. I think these will useful for the execution > >>>>> engine > >>>>>>> and will be leverage depending on the particular needs-- for > example > >>>>>>> fieldwise will be a natural fit where we're operating on columnar > >>>> data > >>>>>> and > >>>>>>> doing an explode or other fieldwise nested operation and inline > will > >>>> be > >>>>>>> useful when we're doing things like sorting a complex field. > Inline > >>>>> will > >>>>>>> also be appropriate where we have extremely sparse record sets. > >>>> We'll > >>>>>> just > >>>>>>> need transformation methods between the two variations. In the > case > >>>>> of a > >>>>>>> fieldwise map type field, the field is virtual and only exists to > >>>>> contain > >>>>>>> its child fields. > >>>>>>> > >>>>>>> - Non-static DataTypes: We have a need types that don't fit the > >>>> static > >>>>>> data > >>>>>>> type model above. Examples include fixed width types (e.g. 10 byte > >>>>>>> string), polymorphic (inline encoded) types (number or string > >>>> depending > >>>>>> on > >>>>>>> record) and repeated nested versions of our other types. These > are a > >>>>>>> little more gnarly as we need to support canonicalization of these. > >>>>>> Optiq > >>>>>>> has some methods for how to handle this kind of type system so it > >>>>>> probably > >>>>>>> makes sense to leverage that system. > >>>>>>> > >>>>>>> *Expression Type Materialization* > >>>>>>> - LogicalExpression type materialization: Right now, > >>>> LogicalExpressions > >>>>>>> include support for late type binding. As part of the record batch > >>>>>>> execution path, these need to get materialized with correct > casting, > >>>>> etc > >>>>>>> based on the actual found schema. As such, we need to have a > >>>> function > >>>>>>> which takes a LogicalExpression tree, applies a materialized > >>>>> BatchSchema > >>>>>>> and returns a new LogicalExpression tree with full type settings. > As > >>>>>> part > >>>>>>> of this process, all types need to be cast as necessary and full > >>>>>> validation > >>>>>>> of the tree should be done. Timothy has a pending work for > >>>> validation > >>>>>>> specifically on a pull request that would be a good piece of code > to > >>>>>>> leverage that need. We also have a visitor model for the > expression > >>>>> tree > >>>>>>> that should be able to aid in the updated LogicalExpression > >>>>> construction. > >>>>>>> -LogicalExpression to Java expression conversion: We need to be > able > >>>> to > >>>>>>> convert our logical expressions into Java code expressions. > >>>> Initially, > >>>>>>> this should be done in a simplistic way, using something like > >>>> implicit > >>>>>>> boxing and the like just to get something working. This will > likely > >>>> be > >>>>>>> specialized per major type (nullable, non-nullable and repeated) > and > >>>> a > >>>>>>> framework might the most sense actually just distinguishing the > >>>>>>> LogicalExpression by these types. > >>>>>>> > >>>>>>> *JDBC* > >>>>>>> - The Drill JDBC driver layer needs to be updated to leverage our > >>>>>> zookeeper > >>>>>>> coordination locations so that it can correctly find the cluster > >>>>>> location. > >>>>>>> - The Drill JDBC driver should also manage reconnects so that if it > >>>>> loses > >>>>>>> connection with a particular Drillbit partner, that it will > reconnect > >>>>> to > >>>>>>> another available node in the cluster. > >>>>>>> - Someone should point SQuirreL at Julian's latest work and see how > >>>>>> things > >>>>>>> go... > >>>>>>> > >>>>>>> *ByteCode Engineering* > >>>>>>> - We need to put together a concrete class materialization > strategy. > >>>>> My > >>>>>>> thinking for relational operators and code generation is that in > most > >>>>>>> cases, we'll have an interface and a template class for a > particular > >>>>>>> relational operator. We will build a template class that has all > the > >>>>>>> generic stuff implemented but will make calls to empty methods > where > >>>> it > >>>>>>> expects lower level operations to occur. This allows things like > the > >>>>>>> looping and certain types of null management to be fully > materialized > >>>>> in > >>>>>>> source code without having to deal with the complexities of > ByteCode > >>>>>>> generation. It also eases testing complexity. When a particular > >>>>>>> implementation is required, the Drillbit will be responsible for > >>>>>> generating > >>>>>>> updated method bodies as required for the record-level expressions, > >>>>>> marking > >>>>>>> all the methods and class as final, then loading the implementation > >>>>> into > >>>>>>> the query-level classloader. Note that the production Drillbit > will > >>>>>> never > >>>>>>> load the template class into the JVM and will simply utilize it in > >>>>>> ByteCode > >>>>>>> form. I was hoping someone can take a look at trying to pull > >>>> together > >>>>> a > >>>>>>> cohesive approach to doing this using ASM and Janino (likely > >>>> utilizing > >>>>>> the > >>>>>>> JDK commons-compiler mode). The interface should be pretty simple: > >>>>> input > >>>>>>> is an interface, a template class name, a set of (method_signature, > >>>>>>> method_body_text) objects and a varargs of objects that are > required > >>>>> for > >>>>>>> object instantiation. The return should be an instance of the > >>>>> interface. > >>>>>>> The interface should check things like method_signature provided to > >>>>>>> available method blocks, the method blocks being replaced are > empty, > >>>>> the > >>>>>>> object constructor matches the set of object argument provided by > the > >>>>>>> object instantiation request, etc. > >>>>>>> > >>>>>>> *ByteBuf Improvements* > >>>>>>> - Our BufferAllocator should support child allocators (getChild()) > >>>> with > >>>>>>> their own memory maximums and accounting (so we can determine the > >>>>> memory > >>>>>>> overhead to particular queries). We also need to be able to > release > >>>>>> entire > >>>>>>> child allocations at once. > >>>>>>> - We need to create a number of primitive type specific wrapping > >>>>> classes > >>>>>>> for ByteBuf. These additions include fixed offset indexing for > >>>>>> operations > >>>>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding > support > >>>>> for > >>>>>>> unsigned values (my preference would be to leverage the work in > Guava > >>>>> if > >>>>>>> that makes sense) and modifying the hard bounds checks to softer > >>>> assert > >>>>>>> checks to increase production performance. While we could do this > >>>>>>> utilizing the ByteBuf interface, from everything I've experienced > and > >>>>>> read, > >>>>>>> we need to minimize issues with inlining and performance so we > really > >>>>>> need > >>>>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly > for > >>>>> the > >>>>>>> wrapping classes. Of course, it is a final package private class. > >>>>> Short > >>>>>>> term that means we really need to create a number of specific > buffer > >>>>>> types > >>>>>>> that wrap it and just put them in the io.netty.buffer package (or > >>>>>>> alternatively create a Drill version or wrapper). > >> > >
