Hello Drillers,
Here are some minutes from the meeting this week, along with some other
notes I kept forgetting to send out. Unfortunately I did miss the meeting
last week.
-Jason
On Tue, Dec 10, 2013 at 10:38 AM, Jacques Nadeau <[email protected]> wrote:
> Look forward to any reports back from meeting
>
attendees: Jason Altekruse, Jingfeng, Timothy Chen, Julian
Aman ( LOOK THIS UP SPELLING)
- working on filed a JIRA for aggregate functions
- sent out code review
- based on code generation for aggregate functions
- hash aggregation work, has been talking to jacques
- prototype in next few weeks
Jinfeng
- explicit casting
- submitted review board request
- working with Yash on implicit casting
- working design for implementing implicit casting this week
John
- no updates
Tim
- Amazon on EMR
- trying to get boostrap action to work
- have to write a script that is hard to test
- has to go back and forth with them
- one interation has been sent recetnly that is waiting for response
- boardcast sender
- pedning review request
- will work more on hash join next
Julian
- talked to Steve McFurson
- said he knew Tim
- he is intersted in Julians work on Drill Mondiran and Optiq
- went to MapR on thursday
- ODBC/JDBC driver
- breaking up Optiq
- pull request on optiq from Jacques with API changes
- Drill and Optiq are doing things in common
- hoping API changes are going to allow sharing more functionality for
Drill
and Optiq
- Drill got around virtual function calls
- common sub-expression elimiation in Drill?
- avoiding re-evaluating the same predicates/math functions
- Optiq is too monolithic
- this is why we didn't use RexNode
- common sub expression elim
- Rex Program
- eliminates common sub expressions with project and filter
- normalize expressions into common form
- a+b = b+a
- does not work for all operators JAva is particular about
order of
AND
- constant folding
- convert const expression into simple constants
- Drill is strong in runtime
- running generated code fast in execution
Jason
- need to look at parquet NPE
Ben, Jacques Jinfeng, Jason, Julian, Steven
jacques
- parquet writer
- currently don't have thin client
- work on separating client from rest of Drill
- evatica?
- designed to be asynchronous?
- not specifically one way or the other
- no thread pools or services
- up to whoever provides the transport
- sqlline is synchronous
- tries to hold it memory
- wants to print column headers
- patch it to truncate it
- option you can set to not retrieve the whole result set
ben
- bug fixes
- join issue 301
- on the list NPE
- enable error logging
- logging to disk?
- might be a problem with parquet reader
- installed with binary, may not have bug fixes
- circular buffer for logging, keeps it a reasonable size but can catch
last 500, 1000 events/errors
mehant
- map stuff
- alternate approach
- our own drill data type
- keep optiq validator happy, any type
- answered in e-mail
-
- can you select * from a table with no knowledge of columns
- should be an error
- jacques disagrees
- obviously we want to select * from files
- JDBC doesn't require record set has defined columns until
return
- cannot change meta-data while query is running
- cannot send some records with one schema and change
with
later results
- need to return a map
- select * means expand column list with meta-data you
have
jason
- file bug for memory usage in tests
- use same base class for all tests
- defined interface for column select, filter pushdown and limit
Jinfeng
- join optimization in Optiq
- condition is in where clause instead of on clause
- join sequences are are having a Cartesian join
- where optiq is enumerating join sequences
- does it enumerate all trees?
- multi-join rel
- a lot of join into same relational expression
- heuristic cost-based algorithm for join order likely to have
lowest cost
- need to apply associativity rule for join
- swap join rule
- if 3 tables need to
- convert left deep tree to right deep tree
- PushJoinThroughJoiRule
- thing to remember
- if you have a 5 way join, order 2^n join orderings
- have to deal with combinatorial expansion
- cannot use optiqs normal join ordering
- multi-join rel should be used
- approximation for best ordering
- what is the threshold for use?
- 6 joins
- generate a few join orderings
- could take that into next stage of processing
- this is the approach hive is taking
- they had same question, there is a JIRA for discussion
- HIVE building their own RELs?
- building their own calling convention
- logical tree -> hive rels
- does optiq have select into or create table as?
- stayed clear of DDL
- does have support for insert
- doesn't need help from optimizer
- but syntax parsing
- feel free to contribute it
- no optimizer changes needed
- insert into does need optimizer
- last link in the pipeline needs to return number rows inserted
- does it support hints?
- no
- want to get optiq into apache incubator status
steven
- re-submit patch for spooling
- writing queries for performance testing
micheal (amazon), jacues, jin feng, amahn, mehant, steven, micheal, julian,
parquet reader
- too many cpu cycles
- jason - push code for selecting columns!
Optiq
- first pass at relating our operator to optiq rels
- try to understand logic of larger classes
- hive guys have similar requirement, using a subset of optiq
- built a class for them called frameworks, basically a static
method
- provide it with code you want it to call when the
environment is set up
- rather than subclassing a bunch of things
- current method is brittle, optiq is constantly
changing
- is frame work something in repo?
- in master, called Frameworks
- optiq prepare_______ is very complex
- look at a better way to organize it
- challenging scenarios is, we want to go from sql
query to some intermediate
- expand grammar to support other concepts in
optiq
- DDL operation
- send it further down the path, concert to rel
nodes
- maybe convert to logical plan
- maybe go to other rel nodes, start optimizing
- take physical rel nodes, generate physical
plan
- want to drop out of optiq at various places
- take logical plan, transform into DrillRelNode
- do optimization on a logical plan
- all sounds reasonable
- not just rel nodes
- other pieces of states, like type factory
- populated validator, needs to come along
- type info not stored in tree, in
validator
- Julian - need to do work to decide which
state is needed in each state of the pipeline
- Jacques - would like to sit and talk
about it
- Julian - set a time next week
- Monday, Tuesday, Thursday best
Mehant
- wrapped up the underscore map stuff
- select * issues
- get changeset reviewed
- remove _map from drill
- small patches to optiq
- queries will be much simpler to write
- only difference now is just table names look like filenames
Amahn - spelling?
- new at MapR
- working on columnar database system at ___
- worked with query optimization, mpp optmizer
- query execution aspects, aggregations and joins, set operations
- IBM worked on OLAP
- Informix worked on Reb Brick
Jin Feng
- join order
- minor changes in Drill to use new optiq code, need update in conjars
- explicit casting
- physical/logical plan
- match cast code that is automatically generated
- unit test, is working, will wrap up
- connect with Yash, deciding type cast compatibility between
types
- allows explict casting
- optiq needed casts to be happy but was removing them
Steven
- found memory problems
- changes in some places fixed some of the problems
- problem with sorts
- thought he fixed it, cannot get memory released