> Here are the notes from todays hangout. Michael, can you copy them into the
> google doc?
Thanks & done.
Cheers,
Michael
--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/
On 22 Oct 2013, at 17:49, Jason Altekruse <[email protected]> wrote:
> Hello All,
>
> Here are the notes from todays hangout. Michael, can you copy them into the
> google doc?
>
> participants: Jacques, Micheal hausenblas, Lisen Mu, Yash Sharma, Jinfeng,
> Jason Altekruse, Harri, Steven Phillips, Timothy Chen, Julien Hyde
>
> New employee at MapR: Jinfeng
> - couple more in the next month
>
> Jacques:
> - merged limit
> - clarify VVs
> - never access internal state of VV when it is invalid
> - release notes
>
> Steven:
> - ordered partitioner
> - abstract out distributed cache interface
> - continue to work on spooling to disk
> Jason:
> -semi-blocking
> - look at sort and ordered hash partitioner
>
> Yash
> - name of functions
> - separate class for operators and functions for more clarity
> - different operators have their own class files
>
> Lisen
> - fork of Drill
> - data pushed form leaves rather than pulled from root
> - we have been thinking about this same problem
> - don't want to wait for IO all the time
> - pre-fetch rather than push
> - in a join you might get pushed a huge amount of data when you
> aren't ready for it
> - stream processing
> - alternative concept around foreman
> - not quite right for streams
> - resource allocation
> - not as much for resource requirements
> -HyperLogLog
> - space saving
> - acceptable - not precise
> - data assembly - business logic
> - approximations will be important to drill
> - no serious thinking about sampling
> - certain types of scanners should support sampling
> - hard with some without reading all data anyway
> - Hbase might be easier to do a scan
> - doing it with their own business logic and statistics
> - hard to generalize
>
> Hari
> - not much for updates
> - pick up with amazon ec2 docs
> - had problem where we need 8 gigs
> - cannot get it running on free micro instance
> - got it working removing the direct memory flag in POM
> - tim - out of memory exception right away
> - was this with or without changing the option for direct
> memory?
>
> Tim
> - wir patch in
> - amp labs big data benchmark
> - having numbers for performance evaluation
> - set up on their repo for drill datasets
> - installing HDFS to all of the nodes
> - doesn't look to complicated
> - cannot submit sql in distributed mode because of bad optimizer
> - recent review board patches
> - describe code more completely
> - hard to review without docs
> - Julien - single powerpoint slide per operator
> - google doc? like the logical plan doc
>
>
> Ben
> - code gen portion of merging receiver
> - no blockers
> - getting to code review soon
>
> Julian
> - joined hortonworks
> - working on optiq
> - helping hive, but also working on Drill
> - making optiq everything it can be
> - splitting JDBC into thin client
> - thinking about it, no implementation yet
> - right now pushing sorts down to Mongo
> - jacques - session next week on JDBC?
> - roadmap on optiq
> - commit logs tell some of the story
> - roadmap would be helpful
> - will put out call for optiq users like drill
> - put together feature list for next release(s)
> - next 6 months, want to be agile, but wants to be more predictable
> - Jinfeng will be working with optimizer and optiq