Re: meeting notes 10/22/13

Michael Hausenblas Tue, 22 Oct 2013 10:04:10 -0700

> Here are the notes from todays hangout. Michael, can you copy them into the 
> google doc?



Thanks & done.

Cheers,
                Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/

On 22 Oct 2013, at 17:49, Jason Altekruse <[email protected]> wrote:

> Hello All,
> 
> Here are the notes from todays hangout. Michael, can you copy them into the
> google doc?
> 
> participants: Jacques, Micheal hausenblas, Lisen Mu, Yash Sharma, Jinfeng,
> Jason Altekruse, Harri, Steven Phillips, Timothy Chen, Julien Hyde
> 
> New employee at MapR: Jinfeng
>    - couple more in the next month
> 
> Jacques:
>    - merged limit
>    - clarify VVs
>        - never access internal state of VV when it is invalid
>    - release notes
> 
> Steven:
>    - ordered partitioner
>        - abstract out distributed cache interface
>    - continue to work on spooling to disk
> Jason:
>    -semi-blocking
>        - look at sort and ordered hash partitioner
> 
> Yash
>    - name of functions
>        - separate class for operators and functions for more clarity
>            - different operators have their own class files
> 
> Lisen
>    - fork of Drill
>        - data pushed form leaves rather than pulled from root
>        - we have been thinking about this same problem
>            - don't want to wait for IO all the time
>            - pre-fetch rather than push
>            - in a join you might get pushed a huge amount of data when you
> aren't ready for it
>            - stream processing
>                - alternative concept around foreman
>                - not quite right for streams
>                - resource allocation
>                    - not as much for resource requirements
>        -HyperLogLog
>            - space saving
>            - acceptable - not precise
>        - data assembly - business logic
>            - approximations will be important to drill
>            - no serious thinking about sampling
>            - certain types of scanners should support sampling
>                - hard with some without reading all data anyway
>                - Hbase might be easier to do a scan
>            - doing it with their own business logic and statistics
>                - hard to generalize
> 
> Hari
>    - not much for updates
>    - pick up with amazon ec2 docs
>        - had problem where we need 8 gigs
>        - cannot get it running on free micro instance
>        - got it working removing the direct memory flag in POM
>        - tim - out of memory exception right away
>            - was this with or without changing the option for direct
> memory?
> 
> Tim
>    - wir patch in
>    - amp labs big data benchmark
>        - having numbers for performance evaluation
>        - set up on their repo for drill datasets
>        - installing HDFS to all of the nodes
>        - doesn't look to complicated
>    - cannot submit sql in distributed mode because of bad optimizer
>    - recent review board patches
>        - describe code more completely
>        - hard to review without docs
>        - Julien - single powerpoint slide per operator
>        - google doc? like the logical plan doc
> 
> 
> Ben
>    - code gen portion of merging receiver
>    - no blockers
>        - getting to code review soon
> 
> Julian
>    - joined hortonworks
>    - working on optiq
>    - helping hive, but also working on Drill
>    - making optiq everything it can be
>    - splitting JDBC into thin client
>        - thinking about it, no implementation yet
>        - right now pushing sorts down to Mongo
>    - jacques - session next week on JDBC?
>    - roadmap on optiq
>        - commit logs tell some of the story
>        - roadmap would be helpful
>        - will put out call for optiq users like drill
>        - put together feature list for next release(s)
>        - next 6 months, want to be agile, but wants to be more predictable
>        - Jinfeng will be working with optimizer and optiq

Re: meeting notes 10/22/13

Reply via email to