Re: B[yi]teSize execwork tasks someone could potentially help out with...

kishore g Fri, 26 Apr 2013 11:14:04 -0700

I have interacted with the Author, let me know if you want me to check.
Good thing was that he is responsive and even added few things for me.



On Fri, Apr 26, 2013 at 10:27 AM, Timothy Chen <[email protected]> wrote:

> Ya, just bringing that up again that. Doubt it will be a blocker.
>
> Tim
>
>
> On Fri, Apr 26, 2013 at 10:12 AM, David Alves <[email protected]>
> wrote:
>
> > good point, i'll try and ask the author.
> > it's a pretty recent lib so that might be an oversight…
> >
> > -david
> >
> > On Apr 26, 2013, at 12:04 PM, Timothy Chen <[email protected]> wrote:
> >
> > > Jacques I think this is the one I emailed you before that has no
> > licensing info.
> > >
> > > Tim
> > >
> > > Sent from my iPhone
> > >
> > > On Apr 26, 2013, at 9:30 AM, David Alves <[email protected]>
> wrote:
> > >
> > >> i've looked through it and looks like it can leverage shared memory,
> > which I was looking for anyway.
> > >> I also like the way garbage collection works (gc in java also clears
> > off-heap).
> > >> I'll take a deeper look during the weekend.
> > >>
> > >> -david
> > >>
> > >> On Apr 26, 2013, at 11:25 AM, Jacques Nadeau <[email protected]>
> > wrote:
> > >>
> > >>> I've looked at that in the past and think the idea of using here is
> > very
> > >>> good.  It seems like ByteBuf is nice as it has things like endianess
> > >>> capabilities, reference counting and management and Netty direct
> > support.
> > >>> On the flipside, larray is nice for its large array capabilities and
> > >>> better input/output interfaces.  The best approach might be to define
> > a new
> > >>> ByteBuf implementation that leverages LArray.  I'll take a look at
> > this in
> > >>> a few days if someone else doesn't want to.
> > >>>
> > >>> j
> > >>>
> > >>> On Fri, Apr 26, 2013 at 8:39 AM, kishore g <[email protected]>
> > wrote:
> > >>>
> > >>>> Fort *ByteBuf Improvements*, Have you looked at LArrayJ
> > >>>> https://github.com/xerial/larray. It has those wrappers and I found
> > it
> > >>>> quite useful. The same person has also written java version for
> snappy
> > >>>> compression. Not sure if you guys have plan to add compression, but
> > one of
> > >>>> the nice things I could do was use the memory offsets for
> > source(compressed
> > >>>> data) and dest(uncompressed array) and do the decompression
> off-heap.
> > It
> > >>>> supports the need for looking up by index and has wrappers for most
> > of the
> > >>>> primitive data types.
> > >>>>
> > >>>> Are you looking at something like this?
> > >>>>
> > >>>> thanks,
> > >>>> Kishore G
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Fri, Apr 26, 2013 at 7:53 AM, Jacques Nadeau <[email protected]
> >
> > >>>> wrote:
> > >>>>
> > >>>>> They are on the list but the list is long :)
> > >>>>>
> > >>>>> Have a good weekend.
> > >>>>>
> > >>>>> On Thu, Apr 25, 2013 at 9:51 PM, Timothy Chen <[email protected]>
> > wrote:
> > >>>>>
> > >>>>>> So if no one picks anything up you will be done with all the work
> in
> > >>>> the
> > >>>>>> next couple of days? :)
> > >>>>>>
> > >>>>>> Would like to help out but I'm traveling to la over the weekend.
> > >>>>>>
> > >>>>>> I'll sync with you Monday to see how I can help then.
> > >>>>>>
> > >>>>>> Tim
> > >>>>>>
> > >>>>>> Sent from my iPhone
> > >>>>>>
> > >>>>>> On Apr 25, 2013, at 9:06 PM, Jacques Nadeau <[email protected]>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> I'm working on the execwork stuff and if someone would like to
> help
> > >>>>> out,
> > >>>>>>> here are a couple of things that need doing.  I figured I'd drop
> > them
> > >>>>>> here
> > >>>>>>> and see if anyone wants to work on them in the next couple of
> days.
> > >>>> If
> > >>>>>> so,
> > >>>>>>> let me know otherwise I'll be picking them up soon.
> > >>>>>>>
> > >>>>>>> *RPC*
> > >>>>>>> - RPC Layer Handshakes: Currently, I haven't implemented the
> > >>>> handshake
> > >>>>>> that
> > >>>>>>> should happen in either the User <> Bit or the Bit <> Bit layer.
> >  The
> > >>>>>> plan
> > >>>>>>> was to use an additional inserted event handler that removed
> itself
> > >>>>> from
> > >>>>>>> the event pipeline after a successful handshake or disconnected
> the
> > >>>>>> channel
> > >>>>>>> on a failed handshake (with appropriate logging).  The main
> > >>>> validation
> > >>>>> at
> > >>>>>>> this point will be simply confirming that both endpoints are
> > running
> > >>>> on
> > >>>>>> the
> > >>>>>>> same protocol version.   The only other information that is
> > currently
> > >>>>>>> needed is that that in the Bit <> Bit communication, the client
> > >>>> should
> > >>>>>>> inform the server of its DrillEndpoint so that the server can
> then
> > >>>> map
> > >>>>>> that
> > >>>>>>> for future communication in the other direction.
> > >>>>>>>
> > >>>>>>> *DataTypes*
> > >>>>>>> - General Expansion: Currently, we have a hodgepodge of datatypes
> > >>>>> within
> > >>>>>>> the org.apache.drill.common.expression.types.DataType.  We need
> to
> > >>>>> clean
> > >>>>>>> this up.  There should be types that map to standard sql types.
>  My
> > >>>>>>> thinking is that we should actually have separate types for each
> > for
> > >>>>>>> nullable, non-nullable and repeated (required, optional and
> > repeated
> > >>>> in
> > >>>>>>> protobuf vernaciular) since we'll generally operate with those
> > values
> > >>>>>>> completely differently (and that each type should reveal which it
> > >>>> is).
> > >>>>>> We
> > >>>>>>> should also have a relationship mapping from each to the other
> > (e.g.
> > >>>>> how
> > >>>>>> to
> > >>>>>>> convert a signed 32 bit int into a nullable signed 32 bit int.
> > >>>>>>>
> > >>>>>>> - Map Types: We don't need nullable but we will need different
> map
> > >>>>> types:
> > >>>>>>> inline and fieldwise.  I think these will useful for the
> execution
> > >>>>> engine
> > >>>>>>> and will be leverage depending on the particular needs-- for
> > example
> > >>>>>>> fieldwise will be a natural fit where we're operating on columnar
> > >>>> data
> > >>>>>> and
> > >>>>>>> doing an explode or other fieldwise nested operation and inline
> > will
> > >>>> be
> > >>>>>>> useful when we're doing things like sorting a complex field.
> >  Inline
> > >>>>> will
> > >>>>>>> also be appropriate where we have extremely sparse record sets.
> > >>>> We'll
> > >>>>>> just
> > >>>>>>> need transformation methods between the two variations.  In the
> > case
> > >>>>> of a
> > >>>>>>> fieldwise map type field, the field is virtual and only exists to
> > >>>>> contain
> > >>>>>>> its child fields.
> > >>>>>>>
> > >>>>>>> - Non-static DataTypes: We have a need types that don't fit the
> > >>>> static
> > >>>>>> data
> > >>>>>>> type model above.  Examples include fixed width types (e.g. 10
> byte
> > >>>>>>> string), polymorphic (inline encoded) types (number or string
> > >>>> depending
> > >>>>>> on
> > >>>>>>> record) and repeated nested versions of our other types.  These
> > are a
> > >>>>>>> little more gnarly as we need to support canonicalization of
> these.
> > >>>>>> Optiq
> > >>>>>>> has some methods for how to handle this kind of type system so it
> > >>>>>> probably
> > >>>>>>> makes sense to leverage that system.
> > >>>>>>>
> > >>>>>>> *Expression Type Materialization*
> > >>>>>>> - LogicalExpression type materialization: Right now,
> > >>>> LogicalExpressions
> > >>>>>>> include support for late type binding.  As part of the record
> batch
> > >>>>>>> execution path, these need to get materialized with correct
> > casting,
> > >>>>> etc
> > >>>>>>> based on the actual found schema.  As such, we need to have a
> > >>>> function
> > >>>>>>> which takes a LogicalExpression tree, applies a materialized
> > >>>>> BatchSchema
> > >>>>>>> and returns a new LogicalExpression tree with full type settings.
> >  As
> > >>>>>> part
> > >>>>>>> of this process, all types need to be cast as necessary and full
> > >>>>>> validation
> > >>>>>>> of the tree should be done.  Timothy has a pending work for
> > >>>> validation
> > >>>>>>> specifically on a pull request that would be a good piece of code
> > to
> > >>>>>>> leverage that need.  We also have a visitor model for the
> > expression
> > >>>>> tree
> > >>>>>>> that should be able to aid in the updated LogicalExpression
> > >>>>> construction.
> > >>>>>>> -LogicalExpression to Java expression conversion: We need to be
> > able
> > >>>> to
> > >>>>>>> convert our logical expressions into Java code expressions.
> > >>>> Initially,
> > >>>>>>> this should be done in a simplistic way, using something like
> > >>>> implicit
> > >>>>>>> boxing and the like just to get something working.  This will
> > likely
> > >>>> be
> > >>>>>>> specialized per major type (nullable, non-nullable and repeated)
> > and
> > >>>> a
> > >>>>>>> framework might the most sense actually just distinguishing the
> > >>>>>>> LogicalExpression by these types.
> > >>>>>>>
> > >>>>>>> *JDBC*
> > >>>>>>> - The Drill JDBC driver layer needs to be updated to leverage our
> > >>>>>> zookeeper
> > >>>>>>> coordination locations so that it can correctly find the cluster
> > >>>>>> location.
> > >>>>>>> - The Drill JDBC driver should also manage reconnects so that if
> it
> > >>>>> loses
> > >>>>>>> connection with a particular Drillbit partner, that it will
> > reconnect
> > >>>>> to
> > >>>>>>> another available node in the cluster.
> > >>>>>>> - Someone should point SQuirreL at Julian's latest work and see
> how
> > >>>>>> things
> > >>>>>>> go...
> > >>>>>>>
> > >>>>>>> *ByteCode Engineering*
> > >>>>>>> - We need to put together a concrete class materialization
> > strategy.
> > >>>>> My
> > >>>>>>> thinking for relational operators and code generation is that in
> > most
> > >>>>>>> cases, we'll have an interface and a template class for a
> > particular
> > >>>>>>> relational operator.  We will build a template class that has all
> > the
> > >>>>>>> generic stuff implemented but will make calls to empty methods
> > where
> > >>>> it
> > >>>>>>> expects lower level operations to occur.  This allows things like
> > the
> > >>>>>>> looping and certain types of null management to be fully
> > materialized
> > >>>>> in
> > >>>>>>> source code without having to deal with the complexities of
> > ByteCode
> > >>>>>>> generation.  It also eases testing complexity.  When a particular
> > >>>>>>> implementation is required, the Drillbit will be responsible for
> > >>>>>> generating
> > >>>>>>> updated method bodies as required for the record-level
> expressions,
> > >>>>>> marking
> > >>>>>>> all the methods and class as final, then loading the
> implementation
> > >>>>> into
> > >>>>>>> the query-level classloader.  Note that the production Drillbit
> > will
> > >>>>>> never
> > >>>>>>> load the template class into the JVM and will simply utilize it
> in
> > >>>>>> ByteCode
> > >>>>>>> form.  I was hoping someone can take a look at trying to pull
> > >>>> together
> > >>>>> a
> > >>>>>>> cohesive approach to doing this using ASM and Janino (likely
> > >>>> utilizing
> > >>>>>> the
> > >>>>>>> JDK commons-compiler mode).  The interface should be pretty
> simple:
> > >>>>> input
> > >>>>>>> is an interface, a template class name, a set of
> (method_signature,
> > >>>>>>> method_body_text) objects and a varargs of objects that are
> > required
> > >>>>> for
> > >>>>>>> object instantiation.  The return should be an instance of the
> > >>>>> interface.
> > >>>>>>> The interface should check things like method_signature provided
> to
> > >>>>>>> available method blocks, the method blocks being replaced are
> > empty,
> > >>>>> the
> > >>>>>>> object constructor matches the set of object argument provided by
> > the
> > >>>>>>> object instantiation request, etc.
> > >>>>>>>
> > >>>>>>> *ByteBuf Improvements*
> > >>>>>>> - Our BufferAllocator should support child allocators
> (getChild())
> > >>>> with
> > >>>>>>> their own memory maximums and accounting (so we can determine the
> > >>>>> memory
> > >>>>>>> overhead to particular queries).  We also need to be able to
> > release
> > >>>>>> entire
> > >>>>>>> child allocations at once.
> > >>>>>>> - We need to create a number of primitive type specific wrapping
> > >>>>> classes
> > >>>>>>> for ByteBuf.  These additions include fixed offset indexing for
> > >>>>>> operations
> > >>>>>>> (e.g. index 1 of an int buffer should be at 4 bytes), adding
> > support
> > >>>>> for
> > >>>>>>> unsigned values (my preference would be to leverage the work in
> > Guava
> > >>>>> if
> > >>>>>>> that makes sense) and modifying the hard bounds checks to softer
> > >>>> assert
> > >>>>>>> checks to increase production performance.  While we could do
> this
> > >>>>>>> utilizing the ByteBuf interface, from everything I've experienced
> > and
> > >>>>>> read,
> > >>>>>>> we need to minimize issues with inlining and performance so we
> > really
> > >>>>>> need
> > >>>>>>> to be able to modify/refer to PooledUnsafeDirectByteBuf directly
> > for
> > >>>>> the
> > >>>>>>> wrapping classes.  Of course, it is a final package private
> class.
> > >>>>> Short
> > >>>>>>> term that means we really need to create a number of specific
> > buffer
> > >>>>>> types
> > >>>>>>> that wrap it and just put them in the io.netty.buffer package (or
> > >>>>>>> alternatively create a Drill version or wrapper).
> > >>
> >
> >
>

Re: B[yi]teSize execwork tasks someone could potentially help out with...

Reply via email to