Re: B[yi]teSize execwork tasks someone could potentially help out with...

Jacques Nadeau Fri, 26 Apr 2013 09:25:29 -0700

I've looked at that in the past and think the idea of using here is very
good.  It seems like ByteBuf is nice as it has things like endianess
capabilities, reference counting and management and Netty direct support.
 On the flipside, larray is nice for its large array capabilities and
better input/output interfaces.  The best approach might be to define a new
ByteBuf implementation that leverages LArray.  I'll take a look at this in
a few days if someone else doesn't want to.


j

On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]> wrote:

> Fort *ByteBuf Improvements*, Have you looked at LArrayJ
> https://github.com/xerial/larray. It has those wrappers and I found it
> quite useful. The same person has also written java version for snappy
> compression. Not sure if you guys have plan to add compression, but one of
> the nice things I could do was use the memory offsets for source(compressed
> data) and dest(uncompressed array) and do the decompression off-heap. It
> supports the need for looking up by index and has wrappers for most of the
> primitive data types.
>
> Are you looking at something like this?
>
> thanks,
> Kishore G
>
>
>
> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau <[email protected]>
> wrote:
>
> > They are on the list but the list is long :)
> >
> > Have a good weekend.
> >
> > On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]> wrote:
> >
> > > So if no one picks anything up you will be done with all the work in
> the
> > > next couple of days? :)
> > >
> > > Would like to help out but I'm traveling to la over the weekend.
> > >
> > > I'll sync with you Monday to see how I can help then.
> > >
> > > Tim
> > >
> > > Sent from my iPhone
> > >
> > > On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]>
> wrote:
> > >
> > > > I'm working on the execwork stuff and if someone would like to help
> > out,
> > > > here are a couple of things that need doing.  I figured I'd drop them
> > > here
> > > > and see if anyone wants to work on them in the next couple of days.
>  If
> > > so,
> > > > let me know otherwise I'll be picking them up soon.
> > > >
> > > > *RPC*
> > > > - RPC Layer Handshakes: Currently, I haven't implemented the
> handshake
> > > that
> > > > should happen in either the User <> Bit or the Bit <> Bit layer.  The
> > > plan
> > > > was to use an additional inserted event handler that removed itself
> > from
> > > > the event pipeline after a successful handshake or disconnected the
> > > channel
> > > > on a failed handshake (with appropriate logging).  The main
> validation
> > at
> > > > this point will be simply confirming that both endpoints are running
> on
> > > the
> > > > same protocol version.   The only other information that is currently
> > > > needed is that that in the Bit <> Bit communication, the client
> should
> > > > inform the server of its DrillEndpoint so that the server can then
> map
> > > that
> > > > for future communication in the other direction.
> > > >
> > > > *DataTypes*
> > > > - General Expansion: Currently, we have a hodgepodge of datatypes
> > within
> > > > the org.apache.drill.common.expression.types.DataType.  We need to
> > clean
> > > > this up.  There should be types that map to standard sql types.  My
> > > > thinking is that we should actually have separate types for each for
> > > > nullable, non-nullable and repeated (required, optional and repeated
> in
> > > > protobuf vernaciular) since we'll generally operate with those values
> > > > completely differently (and that each type should reveal which it
> is).
> > >  We
> > > > should also have a relationship mapping from each to the other (e.g.
> > how
> > > to
> > > > convert a signed 32 bit int into a nullable signed 32 bit int.
> > > >
> > > > - Map Types: We don't need nullable but we will need different map
> > types:
> > > > inline and fieldwise.  I think these will useful for the execution
> > engine
> > > > and will be leverage depending on the particular needs-- for example
> > > > fieldwise will be a natural fit where we're operating on columnar
> data
> > > and
> > > > doing an explode or other fieldwise nested operation and inline will
> be
> > > > useful when we're doing things like sorting a complex field.  Inline
> > will
> > > > also be appropriate where we have extremely sparse record sets.
>  We'll
> > > just
> > > > need transformation methods between the two variations.  In the case
> > of a
> > > > fieldwise map type field, the field is virtual and only exists to
> > contain
> > > > its child fields.
> > > >
> > > > - Non-static DataTypes: We have a need types that don't fit the
> static
> > > data
> > > > type model above.  Examples include fixed width types (e.g. 10 byte
> > > > string), polymorphic (inline encoded) types (number or string
> depending
> > > on
> > > > record) and repeated nested versions of our other types.  These are a
> > > > little more gnarly as we need to support canonicalization of these.
> > >  Optiq
> > > > has some methods for how to handle this kind of type system so it
> > > probably
> > > > makes sense to leverage that system.
> > > >
> > > > *Expression Type Materialization*
> > > > - LogicalExpression type materialization: Right now,
> LogicalExpressions
> > > > include support for late type binding.  As part of the record batch
> > > > execution path, these need to get materialized with correct casting,
> > etc
> > > > based on the actual found schema.  As such, we need to have a
> function
> > > > which takes a LogicalExpression tree, applies a materialized
> > BatchSchema
> > > > and returns a new LogicalExpression tree with full type settings.  As
> > > part
> > > > of this process, all types need to be cast as necessary and full
> > > validation
> > > > of the tree should be done.  Timothy has a pending work for
> validation
> > > > specifically on a pull request that would be a good piece of code to
> > > > leverage that need.  We also have a visitor model for the expression
> > tree
> > > > that should be able to aid in the updated LogicalExpression
> > construction.
> > > > -LogicalExpression to Java expression conversion: We need to be able
> to
> > > > convert our logical expressions into Java code expressions.
>  Initially,
> > > > this should be done in a simplistic way, using something like
> implicit
> > > > boxing and the like just to get something working.  This will likely
> be
> > > > specialized per major type (nullable, non-nullable and repeated) and
> a
> > > > framework might the most sense actually just distinguishing the
> > > > LogicalExpression by these types.
> > > >
> > > > *JDBC*
> > > > - The Drill JDBC driver layer needs to be updated to leverage our
> > > zookeeper
> > > > coordination locations so that it can correctly find the cluster
> > > location.
> > > > - The Drill JDBC driver should also manage reconnects so that if it
> > loses
> > > > connection with a particular Drillbit partner, that it will reconnect
> > to
> > > > another available node in the cluster.
> > > > - Someone should point SQuirreL at Julian's latest work and see how
> > > things
> > > > go...
> > > >
> > > > *ByteCode Engineering*
> > > > - We need to put together a concrete class materialization strategy.
> >  My
> > > > thinking for relational operators and code generation is that in most
> > > > cases, we'll have an interface and a template class for a particular
> > > > relational operator.  We will build a template class that has all the
> > > > generic stuff implemented but will make calls to empty methods where
> it
> > > > expects lower level operations to occur.  This allows things like the
> > > > looping and certain types of null management to be fully materialized
> > in
> > > > source code without having to deal with the complexities of ByteCode
> > > > generation.  It also eases testing complexity.  When a particular
> > > > implementation is required, the Drillbit will be responsible for
> > > generating
> > > > updated method bodies as required for the record-level expressions,
> > > marking
> > > > all the methods and class as final, then loading the implementation
> > into
> > > > the query-level classloader.  Note that the production Drillbit will
> > > never
> > > > load the template class into the JVM and will simply utilize it in
> > > ByteCode
> > > > form.  I was hoping someone can take a look at trying to pull
> together
> > a
> > > > cohesive approach to doing this using ASM and Janino (likely
> utilizing
> > > the
> > > > JDK commons-compiler mode).  The interface should be pretty simple:
> > input
> > > > is an interface, a template class name, a set of (method_signature,
> > > > method_body_text) objects and a varargs of objects that are required
> > for
> > > > object instantiation.  The return should be an instance of the
> > interface.
> > > > The interface should check things like method_signature provided to
> > > > available method blocks, the method blocks being replaced are empty,
> > the
> > > > object constructor matches the set of object argument provided by the
> > > > object instantiation request, etc.
> > > >
> > > > *ByteBuf Improvements*
> > > > - Our BufferAllocator should support child allocators (getChild())
> with
> > > > their own memory maximums and accounting (so we can determine the
> > memory
> > > > overhead to particular queries).  We also need to be able to release
> > > entire
> > > > child allocations at once.
> > > > - We need to create a number of primitive type specific wrapping
> > classes
> > > > for ByteBuf.  These additions include fixed offset indexing for
> > > operations
> > > > (e.g. index 1 of an int buffer should be at 4 bytes), adding support
> > for
> > > > unsigned values (my preference would be to leverage the work in Guava
> > if
> > > > that makes sense) and modifying the hard bounds checks to softer
> assert
> > > > checks to increase production performance.  While we could do this
> > > > utilizing the ByteBuf interface, from everything I've experienced and
> > > read,
> > > > we need to minimize issues with inlining and performance so we really
> > > need
> > > > to be able to modify/refer to PooledUnsafeDirectByteBuf directly for
> > the
> > > > wrapping classes.  Of course, it is a final package private class.
> >  Short
> > > > term that means we really need to create a number of specific buffer
> > > types
> > > > that wrap it and just put them in the io.netty.buffer package (or
> > > > alternatively create a Drill version or wrapper).
> > >
> >
>

Re: B[yi]teSize execwork tasks someone could potentially help out with...

Reply via email to