Hello All,
Here are the notes from todays hangout. Michael, can you copy them into the
google doc?
participants: Jacques, Micheal hausenblas, Lisen Mu, Yash Sharma, Jinfeng,
Jason Altekruse, Harri, Steven Phillips, Timothy Chen, Julien Hyde
New employee at MapR: Jinfeng
- couple more in the next month
Jacques:
- merged limit
- clarify VVs
- never access internal state of VV when it is invalid
- release notes
Steven:
- ordered partitioner
- abstract out distributed cache interface
- continue to work on spooling to disk
Jason:
-semi-blocking
- look at sort and ordered hash partitioner
Yash
- name of functions
- separate class for operators and functions for more clarity
- different operators have their own class files
Lisen
- fork of Drill
- data pushed form leaves rather than pulled from root
- we have been thinking about this same problem
- don't want to wait for IO all the time
- pre-fetch rather than push
- in a join you might get pushed a huge amount of data when you
aren't ready for it
- stream processing
- alternative concept around foreman
- not quite right for streams
- resource allocation
- not as much for resource requirements
-HyperLogLog
- space saving
- acceptable - not precise
- data assembly - business logic
- approximations will be important to drill
- no serious thinking about sampling
- certain types of scanners should support sampling
- hard with some without reading all data anyway
- Hbase might be easier to do a scan
- doing it with their own business logic and statistics
- hard to generalize
Hari
- not much for updates
- pick up with amazon ec2 docs
- had problem where we need 8 gigs
- cannot get it running on free micro instance
- got it working removing the direct memory flag in POM
- tim - out of memory exception right away
- was this with or without changing the option for direct
memory?
Tim
- wir patch in
- amp labs big data benchmark
- having numbers for performance evaluation
- set up on their repo for drill datasets
- installing HDFS to all of the nodes
- doesn't look to complicated
- cannot submit sql in distributed mode because of bad optimizer
- recent review board patches
- describe code more completely
- hard to review without docs
- Julien - single powerpoint slide per operator
- google doc? like the logical plan doc
Ben
- code gen portion of merging receiver
- no blockers
- getting to code review soon
Julian
- joined hortonworks
- working on optiq
- helping hive, but also working on Drill
- making optiq everything it can be
- splitting JDBC into thin client
- thinking about it, no implementation yet
- right now pushing sorts down to Mongo
- jacques - session next week on JDBC?
- roadmap on optiq
- commit logs tell some of the story
- roadmap would be helpful
- will put out call for optiq users like drill
- put together feature list for next release(s)
- next 6 months, want to be agile, but wants to be more predictable
- Jinfeng will be working with optimizer and optiq