Good news, the author of larray got back and he will add the apache license to the source. On Apr 26, 2013 11:13 AM, "kishore g" <[email protected]> wrote:
> I have interacted with the Author, let me know if you want me to check. > Good thing was that he is responsive and even added few things for me. > > > On Fri, Apr 26, 2013 at 10:27 AM, Timothy Chen <[email protected]> wrote: > >> Ya, just bringing that up again that. Doubt it will be a blocker. >> >> Tim >> >> >> On Fri, Apr 26, 2013 at 10:12 AM, David Alves <[email protected]> >> wrote: >> >> > good point, i'll try and ask the author. >> > it's a pretty recent lib so that might be an oversight… >> > >> > -david >> > >> > On Apr 26, 2013, at 12:04 PM, Timothy Chen <[email protected]> wrote: >> > >> > > Jacques I think this is the one I emailed you before that has no >> > licensing info. >> > > >> > > Tim >> > > >> > > Sent from my iPhone >> > > >> > > On Apr 26, 2013, at 9:30 AM, David Alves <[email protected]> >> wrote: >> > > >> > >> i've looked through it and looks like it can leverage shared memory, >> > which I was looking for anyway. >> > >> I also like the way garbage collection works (gc in java also clears >> > off-heap). >> > >> I'll take a deeper look during the weekend. >> > >> >> > >> -david >> > >> >> > >> On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]> >> > wrote: >> > >> >> > >>> I've looked at that in the past and think the idea of using here is >> > very >> > >>> good. It seems like ByteBuf is nice as it has things like endianess >> > >>> capabilities, reference counting and management and Netty direct >> > support. >> > >>> On the flipside, larray is nice for its large array capabilities and >> > >>> better input/output interfaces. The best approach might be to >> define >> > a new >> > >>> ByteBuf implementation that leverages LArray. I'll take a look at >> > this in >> > >>> a few days if someone else doesn't want to. >> > >>> >> > >>> j >> > >>> >> > >>> On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]> >> > wrote: >> > >>> >> > >>>> Fort *ByteBuf Improvements*, Have you looked at LArrayJ >> > >>>> https://github.com/xerial/larray. It has those wrappers and I >> found >> > it >> > >>>> quite useful. The same person has also written java version for >> snappy >> > >>>> compression. Not sure if you guys have plan to add compression, but >> > one of >> > >>>> the nice things I could do was use the memory offsets for >> > source(compressed >> > >>>> data) and dest(uncompressed array) and do the decompression >> off-heap. >> > It >> > >>>> supports the need for looking up by index and has wrappers for most >> > of the >> > >>>> primitive data types. >> > >>>> >> > >>>> Are you looking at something like this? >> > >>>> >> > >>>> thanks, >> > >>>> Kishore G >> > >>>> >> > >>>> >> > >>>> >> > >>>> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau < >> [email protected]> >> > >>>> wrote: >> > >>>> >> > >>>>> They are on the list but the list is long :) >> > >>>>> >> > >>>>> Have a good weekend. >> > >>>>> >> > >>>>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]> >> > wrote: >> > >>>>> >> > >>>>>> So if no one picks anything up you will be done with all the >> work in >> > >>>> the >> > >>>>>> next couple of days? :) >> > >>>>>> >> > >>>>>> Would like to help out but I'm traveling to la over the weekend. >> > >>>>>> >> > >>>>>> I'll sync with you Monday to see how I can help then. >> > >>>>>> >> > >>>>>> Tim >> > >>>>>> >> > >>>>>> Sent from my iPhone >> > >>>>>> >> > >>>>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]> >> > >>>> wrote: >> > >>>>>> >> > >>>>>>> I'm working on the execwork stuff and if someone would like to >> help >> > >>>>> out, >> > >>>>>>> here are a couple of things that need doing. I figured I'd drop >> > them >> > >>>>>> here >> > >>>>>>> and see if anyone wants to work on them in the next couple of >> days. >> > >>>> If >> > >>>>>> so, >> > >>>>>>> let me know otherwise I'll be picking them up soon. >> > >>>>>>> >> > >>>>>>> *RPC* >> > >>>>>>> - RPC Layer Handshakes: Currently, I haven't implemented the >> > >>>> handshake >> > >>>>>> that >> > >>>>>>> should happen in either the User <> Bit or the Bit <> Bit layer. >> > The >> > >>>>>> plan >> > >>>>>>> was to use an additional inserted event handler that removed >> itself >> > >>>>> from >> > >>>>>>> the event pipeline after a successful handshake or disconnected >> the >> > >>>>>> channel >> > >>>>>>> on a failed handshake (with appropriate logging). The main >> > >>>> validation >> > >>>>> at >> > >>>>>>> this point will be simply confirming that both endpoints are >> > running >> > >>>> on >> > >>>>>> the >> > >>>>>>> same protocol version. The only other information that is >> > currently >> > >>>>>>> needed is that that in the Bit <> Bit communication, the client >> > >>>> should >> > >>>>>>> inform the server of its DrillEndpoint so that the server can >> then >> > >>>> map >> > >>>>>> that >> > >>>>>>> for future communication in the other direction. >> > >>>>>>> >> > >>>>>>> *DataTypes* >> > >>>>>>> - General Expansion: Currently, we have a hodgepodge of >> datatypes >> > >>>>> within >> > >>>>>>> the org.apache.drill.common.expression.types.DataType. We need >> to >> > >>>>> clean >> > >>>>>>> this up. There should be types that map to standard sql types. >> My >> > >>>>>>> thinking is that we should actually have separate types for each >> > for >> > >>>>>>> nullable, non-nullable and repeated (required, optional and >> > repeated >> > >>>> in >> > >>>>>>> protobuf vernaciular) since we'll generally operate with those >> > values >> > >>>>>>> completely differently (and that each type should reveal which >> it >> > >>>> is). >> > >>>>>> We >> > >>>>>>> should also have a relationship mapping from each to the other >> > (e.g. >> > >>>>> how >> > >>>>>> to >> > >>>>>>> convert a signed 32 bit int into a nullable signed 32 bit int. >> > >>>>>>> >> > >>>>>>> - Map Types: We don't need nullable but we will need different >> map >> > >>>>> types: >> > >>>>>>> inline and fieldwise. I think these will useful for the >> execution >> > >>>>> engine >> > >>>>>>> and will be leverage depending on the particular needs-- for >> > example >> > >>>>>>> fieldwise will be a natural fit where we're operating on >> columnar >> > >>>> data >> > >>>>>> and >> > >>>>>>> doing an explode or other fieldwise nested operation and inline >> > will >> > >>>> be >> > >>>>>>> useful when we're doing things like sorting a complex field. >> > Inline >> > >>>>> will >> > >>>>>>> also be appropriate where we have extremely sparse record sets. >> > >>>> We'll >> > >>>>>> just >> > >>>>>>> need transformation methods between the two variations. In the >> > case >> > >>>>> of a >> > >>>>>>> fieldwise map type field, the field is virtual and only exists >> to >> > >>>>> contain >> > >>>>>>> its child fields. >> > >>>>>>> >> > >>>>>>> - Non-static DataTypes: We have a need types that don't fit the >> > >>>> static >> > >>>>>> data >> > >>>>>>> type model above. Examples include fixed width types (e.g. 10 >> byte >> > >>>>>>> string), polymorphic (inline encoded) types (number or string >> > >>>> depending >> > >>>>>> on >> > >>>>>>> record) and repeated nested versions of our other types. These >> > are a >> > >>>>>>> little more gnarly as we need to support canonicalization of >> these. >> > >>>>>> Optiq >> > >>>>>>> has some methods for how to handle this kind of type system so >> it >> > >>>>>> probably >> > >>>>>>> makes sense to leverage that system. >> > >>>>>>> >> > >>>>>>> *Expression Type Materialization* >> > >>>>>>> - LogicalExpression type materialization: Right now, >> > >>>> LogicalExpressions >> > >>>>>>> include support for late type binding. As part of the record >> batch >> > >>>>>>> execution path, these need to get materialized with correct >> > casting, >> > >>>>> etc >> > >>>>>>> based on the actual found schema. As such, we need to have a >> > >>>> function >> > >>>>>>> which takes a LogicalExpression tree, applies a materialized >> > >>>>> BatchSchema >> > >>>>>>> and returns a new LogicalExpression tree with full type >> settings. >> > As >> > >>>>>> part >> > >>>>>>> of this process, all types need to be cast as necessary and full >> > >>>>>> validation >> > >>>>>>> of the tree should be done. Timothy has a pending work for >> > >>>> validation >> > >>>>>>> specifically on a pull request that would be a good piece of >> code >> > to >> > >>>>>>> leverage that need. We also have a visitor model for the >> > expression >> > >>>>> tree >> > >>>>>>> that should be able to aid in the updated LogicalExpression >> > >>>>> construction. >> > >>>>>>> -LogicalExpression to Java expression conversion: We need to be >> > able >> > >>>> to >> > >>>>>>> convert our logical expressions into Java code expressions. >> > >>>> Initially, >> > >>>>>>> this should be done in a simplistic way, using something like >> > >>>> implicit >> > >>>>>>> boxing and the like just to get something working. This will >> > likely >> > >>>> be >> > >>>>>>> specialized per major type (nullable, non-nullable and repeated) >> > and >> > >>>> a >> > >>>>>>> framework might the most sense actually just distinguishing the >> > >>>>>>> LogicalExpression by these types. >> > >>>>>>> >> > >>>>>>> *JDBC* >> > >>>>>>> - The Drill JDBC driver layer needs to be updated to leverage >> our >> > >>>>>> zookeeper >> > >>>>>>> coordination locations so that it can correctly find the cluster >> > >>>>>> location. >> > >>>>>>> - The Drill JDBC driver should also manage reconnects so that >> if it >> > >>>>> loses >> > >>>>>>> connection with a particular Drillbit partner, that it will >> > reconnect >> > >>>>> to >> > >>>>>>> another available node in the cluster. >> > >>>>>>> - Someone should point SQuirreL at Julian's latest work and see >> how >> > >>>>>> things >> > >>>>>>> go... >> > >>>>>>> >> > >>>>>>> *ByteCode Engineering* >> > >>>>>>> - We need to put together a concrete class materialization >> > strategy. >> > >>>>> My >> > >>>>>>> thinking for relational operators and code generation is that in >> > most >> > >>>>>>> cases, we'll have an interface and a template class for a >> > particular >> > >>>>>>> relational operator. We will build a template class that has >> all >> > the >> > >>>>>>> generic stuff implemented but will make calls to empty methods >> > where >> > >>>> it >> > >>>>>>> expects lower level operations to occur. This allows things >> like >> > the >> > >>>>>>> looping and certain types of null management to be fully >> > materialized >> > >>>>> in >> > >>>>>>> source code without having to deal with the complexities of >> > ByteCode >> > >>>>>>> generation. It also eases testing complexity. When a >> particular >> > >>>>>>> implementation is required, the Drillbit will be responsible for >> > >>>>>> generating >> > >>>>>>> updated method bodies as required for the record-level >> expressions, >> > >>>>>> marking >> > >>>>>>> all the methods and class as final, then loading the >> implementation >> > >>>>> into >> > >>>>>>> the query-level classloader. Note that the production Drillbit >> > will >> > >>>>>> never >> > >>>>>>> load the template class into the JVM and will simply utilize it >> in >> > >>>>>> ByteCode >> > >>>>>>> form. I was hoping someone can take a look at trying to pull >> > >>>> together >> > >>>>> a >> > >>>>>>> cohesive approach to doing this using ASM and Janino (likely >> > >>>> utilizing >> > >>>>>> the >> > >>>>>>> JDK commons-compiler mode). The interface should be pretty >> simple: >> > >>>>> input >> > >>>>>>> is an interface, a template class name, a set of >> (method_signature, >> > >>>>>>> method_body_text) objects and a varargs of objects that are >> > required >> > >>>>> for >> > >>>>>>> object instantiation. The return should be an instance of the >> > >>>>> interface. >> > >>>>>>> The interface should check things like method_signature >> provided to >> > >>>>>>> available method blocks, the method blocks being replaced are >> > empty, >> > >>>>> the >> > >>>>>>> object constructor matches the set of object argument provided >> by >> > the >> > >>>>>>> object instantiation request, etc. >> > >>>>>>> >> > >>>>>>> *ByteBuf Improvements* >> > >>>>>>> - Our BufferAllocator should support child allocators >> (getChild()) >> > >>>> with >> > >>>>>>> their own memory maximums and accounting (so we can determine >> the >> > >>>>> memory >> > >>>>>>> overhead to particular queries). We also need to be able to >> > release >> > >>>>>> entire >> > >>>>>>> child allocations at once. >> > >>>>>>> - We need to create a number of primitive type specific wrapping >> > >>>>> classes >> > >>>>>>> for ByteBuf. These additions include fixed offset indexing for >> > >>>>>> operations >> > >>>>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding >> > support >> > >>>>> for >> > >>>>>>> unsigned values (my preference would be to leverage the work in >> > Guava >> > >>>>> if >> > >>>>>>> that makes sense) and modifying the hard bounds checks to softer >> > >>>> assert >> > >>>>>>> checks to increase production performance. While we could do >> this >> > >>>>>>> utilizing the ByteBuf interface, from everything I've >> experienced >> > and >> > >>>>>> read, >> > >>>>>>> we need to minimize issues with inlining and performance so we >> > really >> > >>>>>> need >> > >>>>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly >> > for >> > >>>>> the >> > >>>>>>> wrapping classes. Of course, it is a final package private >> class. >> > >>>>> Short >> > >>>>>>> term that means we really need to create a number of specific >> > buffer >> > >>>>>> types >> > >>>>>>> that wrap it and just put them in the io.netty.buffer package >> (or >> > >>>>>>> alternatively create a Drill version or wrapper). >> > >> >> > >> > >> > >
