Re: B[yi]teSize execwork tasks someone could potentially help out with...

Timothy Chen Fri, 26 Apr 2013 10:27:40 -0700

Ya, just bringing that up again that. Doubt it will be a blocker.

Tim



On Fri, Apr 26, 2013 at 10:12 AM, David Alves <[email protected]> wrote:

> good point, i'll try and ask the author.
> it's a pretty recent lib so that might be an oversight…
>
> -david
>
> On Apr 26, 2013, at 12:04 PM, Timothy Chen <[email protected]> wrote:
>
> > Jacques I think this is the one I emailed you before that has no
> licensing info.
> >
> > Tim
> >
> > Sent from my iPhone
> >
> > On Apr 26, 2013, at 9:30 AM, David Alves <[email protected]> wrote:
> >
> >> i've looked through it and looks like it can leverage shared memory,
> which I was looking for anyway.
> >> I also like the way garbage collection works (gc in java also clears
> off-heap).
> >> I'll take a deeper look during the weekend.
> >>
> >> -david
> >>
> >> On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]>
> wrote:
> >>
> >>> I've looked at that in the past and think the idea of using here is
> very
> >>> good.  It seems like ByteBuf is nice as it has things like endianess
> >>> capabilities, reference counting and management and Netty direct
> support.
> >>> On the flipside, larray is nice for its large array capabilities and
> >>> better input/output interfaces.  The best approach might be to define
> a new
> >>> ByteBuf implementation that leverages LArray.  I'll take a look at
> this in
> >>> a few days if someone else doesn't want to.
> >>>
> >>> j
> >>>
> >>> On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]>
> wrote:
> >>>
> >>>> Fort *ByteBuf Improvements*, Have you looked at LArrayJ
> >>>> https://github.com/xerial/larray. It has those wrappers and I found
> it
> >>>> quite useful. The same person has also written java version for snappy
> >>>> compression. Not sure if you guys have plan to add compression, but
> one of
> >>>> the nice things I could do was use the memory offsets for
> source(compressed
> >>>> data) and dest(uncompressed array) and do the decompression off-heap.
> It
> >>>> supports the need for looking up by index and has wrappers for most
> of the
> >>>> primitive data types.
> >>>>
> >>>> Are you looking at something like this?
> >>>>
> >>>> thanks,
> >>>> Kishore G
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> They are on the list but the list is long :)
> >>>>>
> >>>>> Have a good weekend.
> >>>>>
> >>>>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]>
> wrote:
> >>>>>
> >>>>>> So if no one picks anything up you will be done with all the work in
> >>>> the
> >>>>>> next couple of days? :)
> >>>>>>
> >>>>>> Would like to help out but I'm traveling to la over the weekend.
> >>>>>>
> >>>>>> I'll sync with you Monday to see how I can help then.
> >>>>>>
> >>>>>> Tim
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]>
> >>>> wrote:
> >>>>>>
> >>>>>>> I'm working on the execwork stuff and if someone would like to help
> >>>>> out,
> >>>>>>> here are a couple of things that need doing.  I figured I'd drop
> them
> >>>>>> here
> >>>>>>> and see if anyone wants to work on them in the next couple of days.
> >>>> If
> >>>>>> so,
> >>>>>>> let me know otherwise I'll be picking them up soon.
> >>>>>>>
> >>>>>>> *RPC*
> >>>>>>> - RPC Layer Handshakes: Currently, I haven't implemented the
> >>>> handshake
> >>>>>> that
> >>>>>>> should happen in either the User <> Bit or the Bit <> Bit layer.
>  The
> >>>>>> plan
> >>>>>>> was to use an additional inserted event handler that removed itself
> >>>>> from
> >>>>>>> the event pipeline after a successful handshake or disconnected the
> >>>>>> channel
> >>>>>>> on a failed handshake (with appropriate logging).  The main
> >>>> validation
> >>>>> at
> >>>>>>> this point will be simply confirming that both endpoints are
> running
> >>>> on
> >>>>>> the
> >>>>>>> same protocol version.   The only other information that is
> currently
> >>>>>>> needed is that that in the Bit <> Bit communication, the client
> >>>> should
> >>>>>>> inform the server of its DrillEndpoint so that the server can then
> >>>> map
> >>>>>> that
> >>>>>>> for future communication in the other direction.
> >>>>>>>
> >>>>>>> *DataTypes*
> >>>>>>> - General Expansion: Currently, we have a hodgepodge of datatypes
> >>>>> within
> >>>>>>> the org.apache.drill.common.expression.types.DataType.  We need to
> >>>>> clean
> >>>>>>> this up.  There should be types that map to standard sql types.  My
> >>>>>>> thinking is that we should actually have separate types for each
> for
> >>>>>>> nullable, non-nullable and repeated (required, optional and
> repeated
> >>>> in
> >>>>>>> protobuf vernaciular) since we'll generally operate with those
> values
> >>>>>>> completely differently (and that each type should reveal which it
> >>>> is).
> >>>>>> We
> >>>>>>> should also have a relationship mapping from each to the other
> (e.g.
> >>>>> how
> >>>>>> to
> >>>>>>> convert a signed 32 bit int into a nullable signed 32 bit int.
> >>>>>>>
> >>>>>>> - Map Types: We don't need nullable but we will need different map
> >>>>> types:
> >>>>>>> inline and fieldwise.  I think these will useful for the execution
> >>>>> engine
> >>>>>>> and will be leverage depending on the particular needs-- for
> example
> >>>>>>> fieldwise will be a natural fit where we're operating on columnar
> >>>> data
> >>>>>> and
> >>>>>>> doing an explode or other fieldwise nested operation and inline
> will
> >>>> be
> >>>>>>> useful when we're doing things like sorting a complex field.
>  Inline
> >>>>> will
> >>>>>>> also be appropriate where we have extremely sparse record sets.
> >>>> We'll
> >>>>>> just
> >>>>>>> need transformation methods between the two variations.  In the
> case
> >>>>> of a
> >>>>>>> fieldwise map type field, the field is virtual and only exists to
> >>>>> contain
> >>>>>>> its child fields.
> >>>>>>>
> >>>>>>> - Non-static DataTypes: We have a need types that don't fit the
> >>>> static
> >>>>>> data
> >>>>>>> type model above.  Examples include fixed width types (e.g. 10 byte
> >>>>>>> string), polymorphic (inline encoded) types (number or string
> >>>> depending
> >>>>>> on
> >>>>>>> record) and repeated nested versions of our other types.  These
> are a
> >>>>>>> little more gnarly as we need to support canonicalization of these.
> >>>>>> Optiq
> >>>>>>> has some methods for how to handle this kind of type system so it
> >>>>>> probably
> >>>>>>> makes sense to leverage that system.
> >>>>>>>
> >>>>>>> *Expression Type Materialization*
> >>>>>>> - LogicalExpression type materialization: Right now,
> >>>> LogicalExpressions
> >>>>>>> include support for late type binding.  As part of the record batch
> >>>>>>> execution path, these need to get materialized with correct
> casting,
> >>>>> etc
> >>>>>>> based on the actual found schema.  As such, we need to have a
> >>>> function
> >>>>>>> which takes a LogicalExpression tree, applies a materialized
> >>>>> BatchSchema
> >>>>>>> and returns a new LogicalExpression tree with full type settings.
>  As
> >>>>>> part
> >>>>>>> of this process, all types need to be cast as necessary and full
> >>>>>> validation
> >>>>>>> of the tree should be done.  Timothy has a pending work for
> >>>> validation
> >>>>>>> specifically on a pull request that would be a good piece of code
> to
> >>>>>>> leverage that need.  We also have a visitor model for the
> expression
> >>>>> tree
> >>>>>>> that should be able to aid in the updated LogicalExpression
> >>>>> construction.
> >>>>>>> -LogicalExpression to Java expression conversion: We need to be
> able
> >>>> to
> >>>>>>> convert our logical expressions into Java code expressions.
> >>>> Initially,
> >>>>>>> this should be done in a simplistic way, using something like
> >>>> implicit
> >>>>>>> boxing and the like just to get something working.  This will
> likely
> >>>> be
> >>>>>>> specialized per major type (nullable, non-nullable and repeated)
> and
> >>>> a
> >>>>>>> framework might the most sense actually just distinguishing the
> >>>>>>> LogicalExpression by these types.
> >>>>>>>
> >>>>>>> *JDBC*
> >>>>>>> - The Drill JDBC driver layer needs to be updated to leverage our
> >>>>>> zookeeper
> >>>>>>> coordination locations so that it can correctly find the cluster
> >>>>>> location.
> >>>>>>> - The Drill JDBC driver should also manage reconnects so that if it
> >>>>> loses
> >>>>>>> connection with a particular Drillbit partner, that it will
> reconnect
> >>>>> to
> >>>>>>> another available node in the cluster.
> >>>>>>> - Someone should point SQuirreL at Julian's latest work and see how
> >>>>>> things
> >>>>>>> go...
> >>>>>>>
> >>>>>>> *ByteCode Engineering*
> >>>>>>> - We need to put together a concrete class materialization
> strategy.
> >>>>> My
> >>>>>>> thinking for relational operators and code generation is that in
> most
> >>>>>>> cases, we'll have an interface and a template class for a
> particular
> >>>>>>> relational operator.  We will build a template class that has all
> the
> >>>>>>> generic stuff implemented but will make calls to empty methods
> where
> >>>> it
> >>>>>>> expects lower level operations to occur.  This allows things like
> the
> >>>>>>> looping and certain types of null management to be fully
> materialized
> >>>>> in
> >>>>>>> source code without having to deal with the complexities of
> ByteCode
> >>>>>>> generation.  It also eases testing complexity.  When a particular
> >>>>>>> implementation is required, the Drillbit will be responsible for
> >>>>>> generating
> >>>>>>> updated method bodies as required for the record-level expressions,
> >>>>>> marking
> >>>>>>> all the methods and class as final, then loading the implementation
> >>>>> into
> >>>>>>> the query-level classloader.  Note that the production Drillbit
> will
> >>>>>> never
> >>>>>>> load the template class into the JVM and will simply utilize it in
> >>>>>> ByteCode
> >>>>>>> form.  I was hoping someone can take a look at trying to pull
> >>>> together
> >>>>> a
> >>>>>>> cohesive approach to doing this using ASM and Janino (likely
> >>>> utilizing
> >>>>>> the
> >>>>>>> JDK commons-compiler mode).  The interface should be pretty simple:
> >>>>> input
> >>>>>>> is an interface, a template class name, a set of (method_signature,
> >>>>>>> method_body_text) objects and a varargs of objects that are
> required
> >>>>> for
> >>>>>>> object instantiation.  The return should be an instance of the
> >>>>> interface.
> >>>>>>> The interface should check things like method_signature provided to
> >>>>>>> available method blocks, the method blocks being replaced are
> empty,
> >>>>> the
> >>>>>>> object constructor matches the set of object argument provided by
> the
> >>>>>>> object instantiation request, etc.
> >>>>>>>
> >>>>>>> *ByteBuf Improvements*
> >>>>>>> - Our BufferAllocator should support child allocators (getChild())
> >>>> with
> >>>>>>> their own memory maximums and accounting (so we can determine the
> >>>>> memory
> >>>>>>> overhead to particular queries).  We also need to be able to
> release
> >>>>>> entire
> >>>>>>> child allocations at once.
> >>>>>>> - We need to create a number of primitive type specific wrapping
> >>>>> classes
> >>>>>>> for ByteBuf.  These additions include fixed offset indexing for
> >>>>>> operations
> >>>>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding
> support
> >>>>> for
> >>>>>>> unsigned values (my preference would be to leverage the work in
> Guava
> >>>>> if
> >>>>>>> that makes sense) and modifying the hard bounds checks to softer
> >>>> assert
> >>>>>>> checks to increase production performance.  While we could do this
> >>>>>>> utilizing the ByteBuf interface, from everything I've experienced
> and
> >>>>>> read,
> >>>>>>> we need to minimize issues with inlining and performance so we
> really
> >>>>>> need
> >>>>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly
> for
> >>>>> the
> >>>>>>> wrapping classes.  Of course, it is a final package private class.
> >>>>> Short
> >>>>>>> term that means we really need to create a number of specific
> buffer
> >>>>>> types
> >>>>>>> that wrap it and just put them in the io.netty.buffer package (or
> >>>>>>> alternatively create a Drill version or wrapper).
> >>
>
>

Re: B[yi]teSize execwork tasks someone could potentially help out with...

Reply via email to