Great news! Thanks for running that down. J
On Sat, Apr 27, 2013 at 8:54 AM, kishore g <[email protected]> wrote: > Good news, the author of larray got back and he will add the apache license > to the source. > On Apr 26, 2013 11:13 AM, "kishore g" <[email protected]> wrote: > >> I have interacted with the Author, let me know if you want me to check. >> Good thing was that he is responsive and even added few things for me. >> >> >> On Fri, Apr 26, 2013 at 10:27 AM, Timothy Chen <[email protected]> wrote: >> >>> Ya, just bringing that up again that. Doubt it will be a blocker. >>> >>> Tim >>> >>> >>> On Fri, Apr 26, 2013 at 10:12 AM, David Alves <[email protected]> >>> wrote: >>> >>> > good point, i'll try and ask the author. >>> > it's a pretty recent lib so that might be an oversight… >>> > >>> > -david >>> > >>> > On Apr 26, 2013, at 12:04 PM, Timothy Chen <[email protected]> wrote: >>> > >>> > > Jacques I think this is the one I emailed you before that has no >>> > licensing info. >>> > > >>> > > Tim >>> > > >>> > > Sent from my iPhone >>> > > >>> > > On Apr 26, 2013, at 9:30 AM, David Alves <[email protected]> >>> wrote: >>> > > >>> > >> i've looked through it and looks like it can leverage shared memory, >>> > which I was looking for anyway. >>> > >> I also like the way garbage collection works (gc in java also clears >>> > off-heap). >>> > >> I'll take a deeper look during the weekend. >>> > >> >>> > >> -david >>> > >> >>> > >> On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]> >>> > wrote: >>> > >> >>> > >>> I've looked at that in the past and think the idea of using here is >>> > very >>> > >>> good. It seems like ByteBuf is nice as it has things like endianess >>> > >>> capabilities, reference counting and management and Netty direct >>> > support. >>> > >>> On the flipside, larray is nice for its large array capabilities and >>> > >>> better input/output interfaces. The best approach might be to >>> define >>> > a new >>> > >>> ByteBuf implementation that leverages LArray. I'll take a look at >>> > this in >>> > >>> a few days if someone else doesn't want to. >>> > >>> >>> > >>> j >>> > >>> >>> > >>> On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]> >>> > wrote: >>> > >>> >>> > >>>> Fort *ByteBuf Improvements*, Have you looked at LArrayJ >>> > >>>> https://github.com/xerial/larray. It has those wrappers and I >>> found >>> > it >>> > >>>> quite useful. The same person has also written java version for >>> snappy >>> > >>>> compression. Not sure if you guys have plan to add compression, but >>> > one of >>> > >>>> the nice things I could do was use the memory offsets for >>> > source(compressed >>> > >>>> data) and dest(uncompressed array) and do the decompression >>> off-heap. >>> > It >>> > >>>> supports the need for looking up by index and has wrappers for most >>> > of the >>> > >>>> primitive data types. >>> > >>>> >>> > >>>> Are you looking at something like this? >>> > >>>> >>> > >>>> thanks, >>> > >>>> Kishore G >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau < >>> [email protected]> >>> > >>>> wrote: >>> > >>>> >>> > >>>>> They are on the list but the list is long :) >>> > >>>>> >>> > >>>>> Have a good weekend. >>> > >>>>> >>> > >>>>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]> >>> > wrote: >>> > >>>>> >>> > >>>>>> So if no one picks anything up you will be done with all the >>> work in >>> > >>>> the >>> > >>>>>> next couple of days? :) >>> > >>>>>> >>> > >>>>>> Would like to help out but I'm traveling to la over the weekend. >>> > >>>>>> >>> > >>>>>> I'll sync with you Monday to see how I can help then. >>> > >>>>>> >>> > >>>>>> Tim >>> > >>>>>> >>> > >>>>>> Sent from my iPhone >>> > >>>>>> >>> > >>>>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]> >>> > >>>> wrote: >>> > >>>>>> >>> > >>>>>>> I'm working on the execwork stuff and if someone would like to >>> help >>> > >>>>> out, >>> > >>>>>>> here are a couple of things that need doing. I figured I'd drop >>> > them >>> > >>>>>> here >>> > >>>>>>> and see if anyone wants to work on them in the next couple of >>> days. >>> > >>>> If >>> > >>>>>> so, >>> > >>>>>>> let me know otherwise I'll be picking them up soon. >>> > >>>>>>> >>> > >>>>>>> *RPC* >>> > >>>>>>> - RPC Layer Handshakes: Currently, I haven't implemented the >>> > >>>> handshake >>> > >>>>>> that >>> > >>>>>>> should happen in either the User <> Bit or the Bit <> Bit layer. >>> > The >>> > >>>>>> plan >>> > >>>>>>> was to use an additional inserted event handler that removed >>> itself >>> > >>>>> from >>> > >>>>>>> the event pipeline after a successful handshake or disconnected >>> the >>> > >>>>>> channel >>> > >>>>>>> on a failed handshake (with appropriate logging). The main >>> > >>>> validation >>> > >>>>> at >>> > >>>>>>> this point will be simply confirming that both endpoints are >>> > running >>> > >>>> on >>> > >>>>>> the >>> > >>>>>>> same protocol version. The only other information that is >>> > currently >>> > >>>>>>> needed is that that in the Bit <> Bit communication, the client >>> > >>>> should >>> > >>>>>>> inform the server of its DrillEndpoint so that the server can >>> then >>> > >>>> map >>> > >>>>>> that >>> > >>>>>>> for future communication in the other direction. >>> > >>>>>>> >>> > >>>>>>> *DataTypes* >>> > >>>>>>> - General Expansion: Currently, we have a hodgepodge of >>> datatypes >>> > >>>>> within >>> > >>>>>>> the org.apache.drill.common.expression.types.DataType. We need >>> to >>> > >>>>> clean >>> > >>>>>>> this up. There should be types that map to standard sql types. >>> My >>> > >>>>>>> thinking is that we should actually have separate types for each >>> > for >>> > >>>>>>> nullable, non-nullable and repeated (required, optional and >>> > repeated >>> > >>>> in >>> > >>>>>>> protobuf vernaciular) since we'll generally operate with those >>> > values >>> > >>>>>>> completely differently (and that each type should reveal which >>> it >>> > >>>> is). >>> > >>>>>> We >>> > >>>>>>> should also have a relationship mapping from each to the other >>> > (e.g. >>> > >>>>> how >>> > >>>>>> to >>> > >>>>>>> convert a signed 32 bit int into a nullable signed 32 bit int. >>> > >>>>>>> >>> > >>>>>>> - Map Types: We don't need nullable but we will need different >>> map >>> > >>>>> types: >>> > >>>>>>> inline and fieldwise. I think these will useful for the >>> execution >>> > >>>>> engine >>> > >>>>>>> and will be leverage depending on the particular needs-- for >>> > example >>> > >>>>>>> fieldwise will be a natural fit where we're operating on >>> columnar >>> > >>>> data >>> > >>>>>> and >>> > >>>>>>> doing an explode or other fieldwise nested operation and inline >>> > will >>> > >>>> be >>> > >>>>>>> useful when we're doing things like sorting a complex field. >>> > Inline >>> > >>>>> will >>> > >>>>>>> also be appropriate where we have extremely sparse record sets. >>> > >>>> We'll >>> > >>>>>> just >>> > >>>>>>> need transformation methods between the two variations. In the >>> > case >>> > >>>>> of a >>> > >>>>>>> fieldwise map type field, the field is virtual and only exists >>> to >>> > >>>>> contain >>> > >>>>>>> its child fields. >>> > >>>>>>> >>> > >>>>>>> - Non-static DataTypes: We have a need types that don't fit the >>> > >>>> static >>> > >>>>>> data >>> > >>>>>>> type model above. Examples include fixed width types (e.g. 10 >>> byte >>> > >>>>>>> string), polymorphic (inline encoded) types (number or string >>> > >>>> depending >>> > >>>>>> on >>> > >>>>>>> record) and repeated nested versions of our other types. These >>> > are a >>> > >>>>>>> little more gnarly as we need to support canonicalization of >>> these. >>> > >>>>>> Optiq >>> > >>>>>>> has some methods for how to handle this kind of type system so >>> it >>> > >>>>>> probably >>> > >>>>>>> makes sense to leverage that system. >>> > >>>>>>> >>> > >>>>>>> *Expression Type Materialization* >>> > >>>>>>> - LogicalExpression type materialization: Right now, >>> > >>>> LogicalExpressions >>> > >>>>>>> include support for late type binding. As part of the record >>> batch >>> > >>>>>>> execution path, these need to get materialized with correct >>> > casting, >>> > >>>>> etc >>> > >>>>>>> based on the actual found schema. As such, we need to have a >>> > >>>> function >>> > >>>>>>> which takes a LogicalExpression tree, applies a materialized >>> > >>>>> BatchSchema >>> > >>>>>>> and returns a new LogicalExpression tree with full type >>> settings. >>> > As >>> > >>>>>> part >>> > >>>>>>> of this process, all types need to be cast as necessary and full >>> > >>>>>> validation >>> > >>>>>>> of the tree should be done. Timothy has a pending work for >>> > >>>> validation >>> > >>>>>>> specifically on a pull request that would be a good piece of >>> code >>> > to >>> > >>>>>>> leverage that need. We also have a visitor model for the >>> > expression >>> > >>>>> tree >>> > >>>>>>> that should be able to aid in the updated LogicalExpression >>> > >>>>> construction. >>> > >>>>>>> -LogicalExpression to Java expression conversion: We need to be >>> > able >>> > >>>> to >>> > >>>>>>> convert our logical expressions into Java code expressions. >>> > >>>> Initially, >>> > >>>>>>> this should be done in a simplistic way, using something like >>> > >>>> implicit >>> > >>>>>>> boxing and the like just to get something working. This will >>> > likely >>> > >>>> be >>> > >>>>>>> specialized per major type (nullable, non-nullable and repeated) >>> > and >>> > >>>> a >>> > >>>>>>> framework might the most sense actually just distinguishing the >>> > >>>>>>> LogicalExpression by these types. >>> > >>>>>>> >>> > >>>>>>> *JDBC* >>> > >>>>>>> - The Drill JDBC driver layer needs to be updated to leverage >>> our >>> > >>>>>> zookeeper >>> > >>>>>>> coordination locations so that it can correctly find the cluster >>> > >>>>>> location. >>> > >>>>>>> - The Drill JDBC driver should also manage reconnects so that >>> if it >>> > >>>>> loses >>> > >>>>>>> connection with a particular Drillbit partner, that it will >>> > reconnect >>> > >>>>> to >>> > >>>>>>> another available node in the cluster. >>> > >>>>>>> - Someone should point SQuirreL at Julian's latest work and see >>> how >>> > >>>>>> things >>> > >>>>>>> go... >>> > >>>>>>> >>> > >>>>>>> *ByteCode Engineering* >>> > >>>>>>> - We need to put together a concrete class materialization >>> > strategy. >>> > >>>>> My >>> > >>>>>>> thinking for relational operators and code generation is that in >>> > most >>> > >>>>>>> cases, we'll have an interface and a template class for a >>> > particular >>> > >>>>>>> relational operator. We will build a template class that has >>> all >>> > the >>> > >>>>>>> generic stuff implemented but will make calls to empty methods >>> > where >>> > >>>> it >>> > >>>>>>> expects lower level operations to occur. This allows things >>> like >>> > the >>> > >>>>>>> looping and certain types of null management to be fully >>> > materialized >>> > >>>>> in >>> > >>>>>>> source code without having to deal with the complexities of >>> > ByteCode >>> > >>>>>>> generation. It also eases testing complexity. When a >>> particular >>> > >>>>>>> implementation is required, the Drillbit will be responsible for >>> > >>>>>> generating >>> > >>>>>>> updated method bodies as required for the record-level >>> expressions, >>> > >>>>>> marking >>> > >>>>>>> all the methods and class as final, then loading the >>> implementation >>> > >>>>> into >>> > >>>>>>> the query-level classloader. Note that the production Drillbit >>> > will >>> > >>>>>> never >>> > >>>>>>> load the template class into the JVM and will simply utilize it >>> in >>> > >>>>>> ByteCode >>> > >>>>>>> form. I was hoping someone can take a look at trying to pull >>> > >>>> together >>> > >>>>> a >>> > >>>>>>> cohesive approach to doing this using ASM and Janino (likely >>> > >>>> utilizing >>> > >>>>>> the >>> > >>>>>>> JDK commons-compiler mode). The interface should be pretty >>> simple: >>> > >>>>> input >>> > >>>>>>> is an interface, a template class name, a set of >>> (method_signature, >>> > >>>>>>> method_body_text) objects and a varargs of objects that are >>> > required >>> > >>>>> for >>> > >>>>>>> object instantiation. The return should be an instance of the >>> > >>>>> interface. >>> > >>>>>>> The interface should check things like method_signature >>> provided to >>> > >>>>>>> available method blocks, the method blocks being replaced are >>> > empty, >>> > >>>>> the >>> > >>>>>>> object constructor matches the set of object argument provided >>> by >>> > the >>> > >>>>>>> object instantiation request, etc. >>> > >>>>>>> >>> > >>>>>>> *ByteBuf Improvements* >>> > >>>>>>> - Our BufferAllocator should support child allocators >>> (getChild()) >>> > >>>> with >>> > >>>>>>> their own memory maximums and accounting (so we can determine >>> the >>> > >>>>> memory >>> > >>>>>>> overhead to particular queries). We also need to be able to >>> > release >>> > >>>>>> entire >>> > >>>>>>> child allocations at once. >>> > >>>>>>> - We need to create a number of primitive type specific wrapping >>> > >>>>> classes >>> > >>>>>>> for ByteBuf. These additions include fixed offset indexing for >>> > >>>>>> operations >>> > >>>>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding >>> > support >>> > >>>>> for >>> > >>>>>>> unsigned values (my preference would be to leverage the work in >>> > Guava >>> > >>>>> if >>> > >>>>>>> that makes sense) and modifying the hard bounds checks to softer >>> > >>>> assert >>> > >>>>>>> checks to increase production performance. While we could do >>> this >>> > >>>>>>> utilizing the ByteBuf interface, from everything I've >>> experienced >>> > and >>> > >>>>>> read, >>> > >>>>>>> we need to minimize issues with inlining and performance so we >>> > really >>> > >>>>>> need >>> > >>>>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly >>> > for >>> > >>>>> the >>> > >>>>>>> wrapping classes. Of course, it is a final package private >>> class. >>> > >>>>> Short >>> > >>>>>>> term that means we really need to create a number of specific >>> > buffer >>> > >>>>>> types >>> > >>>>>>> that wrap it and just put them in the io.netty.buffer package >>> (or >>> > >>>>>>> alternatively create a Drill version or wrapper). >>> > >> >>> > >>> > >>> >> >>
