Re: Drill 2.0 (design) hackathon

Charles Givre Fri, 15 Sep 2017 14:55:50 -0700

Hi Pritesh, 
What time do you think you’d want me to present?  Also, should I make some 
slides?  
Best,
— C


> On Sep 15, 2017, at 13:23, Pritesh Maker <[email protected]> wrote:
> 
> Hi All
> 
> We are looking forward to hosting the hackathon on Monday. Just a few updates 
> on the logistics and agenda
> 
> • We are expecting over 25 people attending the event – you can see the 
> attendee list at the Eventbrite site -  
> https://www.eventbrite.com/e/drill-developer-day-sept-2017-registration-7478463285
>  
> 
> • Breakfast will be served starting at 8:30AM – we would like to begin 
> promptly at 9AM 
> 
> • The agenda has been updated to reflect the speakers (see the update in the 
> sheet - 
> https://docs.google.com/spreadsheets/d/1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0
>  )
> o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha 
> o Community Contributions – Anil Kumar, John Omernik, Charles Givre and Ted 
> Dunning 
> o Two tracks for technical design discussions – some topics have initial 
> thoughts for the topics and some will have open brainstorming discussions
> o Once the discussions are concluded, we will have summaries presented and 
> notes shared with the community
> 
> • We will have a WebEx for the first two sessions. For the two tracks, we 
> will either continue the WebEx or have Hangout links (will publish them to 
> the google sheet)
> "JOIN WEBEX MEETING
> https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6c76
> Meeting number (access code): 806 111 950
> Meeting password: ApacheDrill"
> 
> • For the attendees in person, we have made bookings for a dinner in the 
> evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas 
> 
> Looking forward to a fantastic day for the Apache Drill! community!
> 
> Thanks,
> Pritesh
> 
> 
> 
> On 9/5/17, 10:47 PM, "Aman Sinha" <[email protected]> wrote:
> 
>    Here is the Eventbrite event for registration:
> 
>    
> https://www.eventbrite.com/e/drill-developer-day-sept-2017-registration-7478463285
> 
>    Please register so we can plan for food and drinks appropriately.
> 
>    The link also contains a google doc link for the preliminary agenda and a
>    'Topics' tab with volunteer sign-up column.  Please add your name to the
>    area(s) of interest.
> 
>    Thanks and look forward to seeing you all !
> 
>    -Aman
> 
>    On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <[email protected]> wrote:
> 
>> A partial list of Drill’s public APIs:
>> 
>> IMHO, highest priority for Drill 2.0.
>> 
>> 
>>  *   JDBC/ODBC drivers
>>  *   Client (for JDBC/ODBC) + ODBC & JDBC
>>  *   Client (for full Drill async, columnar)
>>  *   Storage plugin
>>  *   Format plugin
>>  *   System/session options
>>  *   Queueing (e.g. ZK-based queues)
>>  *   Rest API
>>  *   Resource Planning (e.g. max query memory per node)
>>  *   Metadata access, storage (e.g. file system locations vs. a metastore)
>>  *   Metadata files formats (Parquet, views, etc.)
>> 
>> Lower priority for future releases:
>> 
>> 
>>  *   Query Planning (e.g. Calcite rules)
>>  *   Config options
>>  *   SQL syntax, especially Drill extensions
>>  *   UDF
>>  *   Management (e.g. JMX, Rest API calls, etc.)
>>  *   Drill File System (HDFS)
>>  *   Web UI
>>  *   Shell scripts
>> 
>> There are certainly more. Please suggest those that are missing. I’ve
>> taken a rough cut at which APIs need forward/backward compatibility first,
>> in part based on those that are the “most public” and most likely to
>> change. Others are important, but we can’t do them all at once.
>> 
>> Thanks,
>> 
>> - Paul
>> 
>> On Aug 29, 2017, at 6:00 PM, Aman Sinha <[email protected]<mailto:a
>> [email protected]>> wrote:
>> 
>> Hi Paul,
>> certainly makes sense to have the API compatibility discussions during this
>> hackathon.  The 2.0 release may be a good checkpoint to introduce breaking
>> changes necessitating changes to the ODBC/JDBC drivers and other external
>> applications. As part of this exercise (not during the hackathon but as a
>> follow-up action), we also should clearly identify the "public" interfaces.
>> 
>> 
>> I will add this to the agenda.
>> 
>> thanks,
>> -Aman
>> 
>> On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <[email protected]<mailto:
>> [email protected]>> wrote:
>> 
>> Thanks Aman for organizing the Hackathon!
>> 
>> The list included many good ideas for Drill 2.0. Some of those require
>> changes to Drill’s “public” interfaces (file format, client protocol, SQL
>> behavior, etc.)
>> 
>> At present, Drill has no good mechanism to handle backward/forward
>> compatibility at the API level. Protobuf versioning certainly helps, but
>> can’t completely solve semantic changes (where a field changes meaning, or
>> a non-Protobuf data chunk changes format.) As just one concrete example,
>> changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class
>> names and data formats will change.
>> 
>> Perhaps we can prioritize, for the proposed 2.0 release, a one-time set of
>> breaking changes that introduce a versioning mechanism into our public
>> APIs. Once these are in place, we can evolve the APIs in the future by
>> following the newly-created versioning protocol.
>> 
>> Without such a mechanism, we cannot support old & new clients in the same
>> cluster. Nor can we support rolling upgrades. Of course, another solution
>> is to get it right the second time, then freeze all APIs and agree to never
>> again change them. Not sure we have sufficient access to a crystal ball to
>> predict everything we’d ever need in our APIs, however...
>> 
>> Thanks,
>> 
>> - Paul
>> 
>> On Aug 24, 2017, at 8:39 AM, Aman Sinha <[email protected]<mailto:a
>> [email protected]>> wrote:
>> 
>> Drill Developers,
>> 
>> In order to kick-start the Drill 2.0  release discussions, I would like
>> to
>> propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).
>> 
>> As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
>> Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
>> for most of you!
>> 
>> The goal is to get the community together for a day-long technical
>> discussion on key topics in preparation for a Drill 2.0 release as well
>> as
>> potential improvements in upcoming 1.xx releases.  Depending on the
>> interest areas, we could form groups and have a volunteer lead each
>> group.
>> 
>> Based on prior discussions on the dev list, hangouts and existing JIRAs,
>> there is already a substantial set of topics and I have summarized a few
>> of
>> them below.   What other topics do folks want to talk about?   Feel free
>> to
>> respond to this thread and I will create a google doc to consolidate.
>> Understandably, the list would be long but we will use the hackathon to
>> get
>> a sense of a reasonable feature set for 1.xx and 2.0 releases.
>> 
>> 
>> 1. Metadata management.
>> 
>> 1a: Defining an abstraction layer for various types of metadata: views,
>> schema, statistics, security
>> 
>> 1b: Underlying storage for metadata: what are the options and their
>> trade-offs?
>> 
>>    - Hive metastore
>> 
>>    - Parquet metadata cache (parquet specific)
>> 
>>    - An embedded DBMS
>> 
>>    - A distributed key-value store
>> 
>>    - Others..
>> 
>> 
>> 
>> 2. Drill integration with Apache Arrow
>> 
>> 2a: Evaluate the choices and tradeoffs
>> 
>> 
>> 
>> 3. Resource management
>> 
>> 3a: Memory limits per query
>> 
>> 3b: Spilling
>> 
>> 3c: Resource management with Drill on Yarn/Mesos/Kubernetes
>> 
>> 3d: Local vs. global resource management
>> 
>> 3e: Aligning with admission control/queueing
>> 
>> 
>> 
>> 4. TPC-DS coverage and related planner/operator enhancements
>> 
>> 4a: Additional set operations: INTERSECT, EXCEPT
>> 
>> 4b: GROUPING SETS, ROLLUP, CUBE support
>> 
>> 4c: Handling inequality joins and cartesian joins of non-scalar inputs
>> (via Nested Loop Join)
>> 
>> 4d: Remaining gaps in correlated subquery
>> 
>> 4e: Statistics: Number of Distinct Values, Histograms
>> 
>> 
>> 
>> 5. Schema handling
>> 
>> 5a: Creation, management of schema
>> 
>> 5b: Handling schema changes in certain common cases
>> 
>> 5c: Schema-awareness
>> 
>> 5d: Others TBD
>> 
>> 
>> 
>> 6. Concurrency
>> 
>> 6a: What are the bottlenecks to achieving higher concurrency
>> 
>> 6b: Ideas to address these..e.g async execution ?
>> 
>> 
>> 
>> 7. Storage plugins,  REST APIs related enhancements
>> 
>>  <Topics TBD>
>> 
>> 
>> 
>> 8. Performance improvements
>> 
>> 8a: Filter pushdown
>> 
>> 8b: Vectorized Parquet reader
>> 
>> 8c: Code-gen improvements
>> 
>> 8d: Others TBD
>> 
>> 
>> 
>> 
> 
>

Re: Drill 2.0 (design) hackathon

Reply via email to