Hello Drillers,
Here are some minutes from the meeting this week, along with some other
notes I kept forgetting to send out. Unfortunately I did miss the meeting
last week.

-Jason


On Tue, Dec 10, 2013 at 10:38 AM, Jacques Nadeau <[email protected]> wrote:

> Look forward to any reports back from meeting
>
attendees: Jason Altekruse, Jingfeng, Timothy Chen, Julian

Aman ( LOOK THIS UP SPELLING)
- working on filed a JIRA for aggregate functions
    - sent out code review
- based on code generation for aggregate functions
    - hash aggregation work, has been talking to jacques
    - prototype in next few weeks

Jinfeng
    - explicit casting
        - submitted review board request
        - working with Yash on implicit casting
        - working design for implementing implicit casting this week
    
John
    - no updates

Tim
    - Amazon on EMR
    - trying to get boostrap action to work
    - have to write a script that is hard to test
        - has to go back and forth with them
        - one interation has been sent recetnly that is waiting for response
    - boardcast sender
        - pedning review request
    - will work more on hash join next

Julian
    - talked to Steve McFurson
        - said he knew Tim
        - he is intersted in Julians work on Drill Mondiran and Optiq
    - went to MapR on thursday
    - ODBC/JDBC driver
    - breaking up Optiq
        - pull request on optiq from Jacques with API changes
        - Drill and Optiq are doing things in common
        - hoping API changes are going to allow sharing more functionality for 
Drill
          and Optiq
        - Drill got around virtual function calls
        - common sub-expression elimiation in Drill?
            - avoiding re-evaluating the same predicates/math functions
    - Optiq is too monolithic
        - this is why we didn't use RexNode
    - common sub expression elim
        - Rex Program
            - eliminates common sub expressions with project and filter
            - normalize expressions into common form
                - a+b = b+a
                - does not work for all operators JAva is particular about 
order of
                  AND
            - constant folding
                - convert const expression into simple constants
            - Drill is strong in runtime
                - running generated code fast in execution

Jason
    - need to look at parquet NPE
Ben, Jacques Jinfeng, Jason, Julian, Steven

jacques
    - parquet writer
    - currently don't have thin client
        - work on separating client from rest of Drill
    - evatica?
    - designed to be asynchronous?
        - not specifically one way or the other
        - no thread pools or services
                - up to whoever provides the transport
                - sqlline is synchronous
            - tries to hold it memory
            - wants to print column headers
                - patch it to truncate it
                - option you can set to not retrieve the whole result set
ben
    - bug fixes
        - join issue 301
    - on the list NPE
        - enable error logging
        - logging to disk?
        - might be a problem with parquet reader
            - installed with binary, may not have bug fixes
        - circular buffer for logging, keeps it a reasonable size but can catch 
last 500, 1000 events/errors
    
mehant
    - map stuff
        - alternate approach
        - our own drill data type
            - keep optiq validator happy, any type
            - answered in e-mail
                - 
            - can you select * from a table with no knowledge of columns
                - should be an error
                - jacques disagrees
                    - obviously we want to select * from files
                    - JDBC doesn't require record set has defined columns until
                      return
                        - cannot change meta-data while query is running
                        - cannot send some records with one schema and change 
with
                          later results
                        - need to return a map
                        - select * means expand column list with meta-data you 
have


jason
- file bug for memory usage in tests
    - use same base class for all tests
- defined interface for column select, filter pushdown and limit

Jinfeng
- join optimization in Optiq
    - condition is in where clause instead of on clause
    - join sequences are are having a Cartesian join
    - where optiq is enumerating join sequences
    - does it enumerate all trees?
        - multi-join rel
            - a lot of join into same relational expression
            - heuristic cost-based algorithm for join order likely to have
              lowest cost
        - need to apply associativity rule for join
        - swap join rule
            - if 3 tables need to 
            - convert left deep tree to right deep tree
            - PushJoinThroughJoiRule
            - thing to remember
                - if you have a 5 way join, order 2^n join orderings
                - have to deal with combinatorial expansion
                - cannot use optiqs normal join ordering 
                - multi-join rel should be used
                    - approximation for best ordering
                    - what is the threshold for use?
                        - 6 joins
                - generate a few join orderings
                    - could take that into next stage of processing
                    - this is the approach hive is taking
                    - they had same question, there is a JIRA for discussion
            - HIVE building their own RELs?
                - building their own calling convention
                - logical tree -> hive rels
        - does optiq have select into or create table as?
            - stayed clear of DDL
            - does have support for insert
            - doesn't need help from optimizer
            - but syntax parsing
                - feel free to contribute it
                - no optimizer changes needed
            - insert into does need optimizer
                - last link in the pipeline needs to return number rows inserted
            - does it support hints?
                - no
        - want to get optiq into apache incubator status
steven
- re-submit patch for spooling
- writing queries for performance testing 
micheal (amazon), jacues, jin feng, amahn, mehant, steven, micheal, julian, 

parquet reader
        - too many cpu cycles
        - jason - push code for selecting columns!

Optiq
        - first pass at relating our operator to optiq rels
        - try to understand logic of larger classes
                - hive guys have similar requirement, using a subset of optiq
                - built a class for them called frameworks, basically a static 
method
                        - provide it with code you want it to call when the 
environment is set up
                        - rather than subclassing a bunch of things
                        - current method is brittle, optiq is constantly 
changing
                        - is frame work something in repo? 
                                - in master, called Frameworks
                        - optiq prepare_______ is very complex
                                - look at a better way to organize it
                        - challenging scenarios is, we want to go from sql 
query to some intermediate
                                - expand grammar to support other concepts in 
optiq
                                - DDL operation
                                - send it further down the path, concert to rel 
nodes
                                - maybe convert to logical plan
                                - maybe go to other rel nodes, start optimizing
                                - take physical rel nodes, generate physical 
plan
                                - want to drop out of optiq at various places
                                - take logical plan, transform into DrillRelNode
                                - do optimization on a logical plan
                        - all sounds reasonable
                                - not just rel nodes
                                - other pieces of states, like type factory
                                - populated validator, needs to come along
                                        - type info not stored in tree, in 
validator
                                - Julian - need to do work to decide which 
state is needed in each state of the pipeline
                                        - Jacques - would like to sit and talk 
about it
                                        - Julian - set a time next week
                                                - Monday, Tuesday, Thursday best


Mehant
        - wrapped up the underscore map stuff
                - select * issues
        - get changeset reviewed
                - remove _map from drill
        - small patches to optiq
        - queries will be much simpler to write
                - only difference now is just table names look like filenames

Amahn - spelling?
        - new at MapR
        - working on columnar database system at ___
        - worked with query optimization, mpp optmizer
        - query execution aspects, aggregations and joins, set operations
        - IBM worked on OLAP
        - Informix worked on Reb Brick

Jin Feng
        - join order
        - minor changes in Drill to use new optiq code, need update in conjars
        - explicit casting
                - physical/logical plan
                - match cast code that is automatically generated
                - unit test, is working, will wrap up
                - connect with Yash, deciding type cast compatibility between 
types
                - allows explict casting
                        - optiq needed casts to be happy but was removing them

Steven
        - found memory problems
                - changes in some places fixed some of the problems
                - problem with sorts
                        - thought he fixed it, cannot get memory released

Reply via email to