meeting notes 10/22/13

Jason Altekruse Tue, 22 Oct 2013 09:50:47 -0700

Hello All,

Here are the notes from todays hangout. Michael, can you copy them into the
google doc?


participants: Jacques, Micheal hausenblas, Lisen Mu, Yash Sharma, Jinfeng,
Jason Altekruse, Harri, Steven Phillips, Timothy Chen, Julien Hyde

New employee at MapR: Jinfeng
    - couple more in the next month

Jacques:
    - merged limit
    - clarify VVs
        - never access internal state of VV when it is invalid
    - release notes

Steven:
    - ordered partitioner
        - abstract out distributed cache interface
    - continue to work on spooling to disk
Jason:
    -semi-blocking
        - look at sort and ordered hash partitioner

Yash
    - name of functions
        - separate class for operators and functions for more clarity
            - different operators have their own class files

Lisen
    - fork of Drill
        - data pushed form leaves rather than pulled from root
        - we have been thinking about this same problem
            - don't want to wait for IO all the time
            - pre-fetch rather than push
            - in a join you might get pushed a huge amount of data when you
aren't ready for it
            - stream processing
                - alternative concept around foreman
                - not quite right for streams
                - resource allocation
                    - not as much for resource requirements
        -HyperLogLog
            - space saving
            - acceptable - not precise
        - data assembly - business logic
            - approximations will be important to drill
            - no serious thinking about sampling
            - certain types of scanners should support sampling
                - hard with some without reading all data anyway
                - Hbase might be easier to do a scan
            - doing it with their own business logic and statistics
                - hard to generalize

Hari
    - not much for updates
    - pick up with amazon ec2 docs
        - had problem where we need 8 gigs
        - cannot get it running on free micro instance
        - got it working removing the direct memory flag in POM
        - tim - out of memory exception right away
            - was this with or without changing the option for direct
memory?

Tim
    - wir patch in
    - amp labs big data benchmark
        - having numbers for performance evaluation
        - set up on their repo for drill datasets
        - installing HDFS to all of the nodes
        - doesn't look to complicated
    - cannot submit sql in distributed mode because of bad optimizer
    - recent review board patches
        - describe code more completely
        - hard to review without docs
        - Julien - single powerpoint slide per operator
        - google doc? like the logical plan doc


Ben
    - code gen portion of merging receiver
    - no blockers
        - getting to code review soon

Julian
    - joined hortonworks
    - working on optiq
    - helping hive, but also working on Drill
    - making optiq everything it can be
    - splitting JDBC into thin client
        - thinking about it, no implementation yet
        - right now pushing sorts down to Mongo
    - jacques - session next week on JDBC?
    - roadmap on optiq
        - commit logs tell some of the story
        - roadmap would be helpful
        - will put out call for optiq users like drill
        - put together feature list for next release(s)
        - next 6 months, want to be agile, but wants to be more predictable
        - Jinfeng will be working with optimizer and optiq

meeting notes 10/22/13

Reply via email to