Hi Pritesh, What time do you think you’d want me to present? Also, should I make some slides? Best, — C
> On Sep 15, 2017, at 13:23, Pritesh Maker <pma...@mapr.com> wrote: > > Hi All > > We are looking forward to hosting the hackathon on Monday. Just a few updates > on the logistics and agenda > > • We are expecting over 25 people attending the event – you can see the > attendee list at the Eventbrite site - > https://www.eventbrite.com/e/drill-developer-day-sept-2017-registration-7478463285 > > > • Breakfast will be served starting at 8:30AM – we would like to begin > promptly at 9AM > > • The agenda has been updated to reflect the speakers (see the update in the > sheet - > https://docs.google.com/spreadsheets/d/1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 > ) > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman Sinha > o Community Contributions – Anil Kumar, John Omernik, Charles Givre and Ted > Dunning > o Two tracks for technical design discussions – some topics have initial > thoughts for the topics and some will have open brainstorming discussions > o Once the discussions are concluded, we will have summaries presented and > notes shared with the community > > • We will have a WebEx for the first two sessions. For the two tracks, we > will either continue the WebEx or have Hangout links (will publish them to > the google sheet) > "JOIN WEBEX MEETING > https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6c76 > Meeting number (access code): 806 111 950 > Meeting password: ApacheDrill" > > • For the attendees in person, we have made bookings for a dinner in the > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas > > Looking forward to a fantastic day for the Apache Drill! community! > > Thanks, > Pritesh > > > > On 9/5/17, 10:47 PM, "Aman Sinha" <amansi...@apache.org> wrote: > > Here is the Eventbrite event for registration: > > > https://www.eventbrite.com/e/drill-developer-day-sept-2017-registration-7478463285 > > Please register so we can plan for food and drinks appropriately. > > The link also contains a google doc link for the preliminary agenda and a > 'Topics' tab with volunteer sign-up column. Please add your name to the > area(s) of interest. > > Thanks and look forward to seeing you all ! > > -Aman > > On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <prog...@mapr.com> wrote: > >> A partial list of Drill’s public APIs: >> >> IMHO, highest priority for Drill 2.0. >> >> >> * JDBC/ODBC drivers >> * Client (for JDBC/ODBC) + ODBC & JDBC >> * Client (for full Drill async, columnar) >> * Storage plugin >> * Format plugin >> * System/session options >> * Queueing (e.g. ZK-based queues) >> * Rest API >> * Resource Planning (e.g. max query memory per node) >> * Metadata access, storage (e.g. file system locations vs. a metastore) >> * Metadata files formats (Parquet, views, etc.) >> >> Lower priority for future releases: >> >> >> * Query Planning (e.g. Calcite rules) >> * Config options >> * SQL syntax, especially Drill extensions >> * UDF >> * Management (e.g. JMX, Rest API calls, etc.) >> * Drill File System (HDFS) >> * Web UI >> * Shell scripts >> >> There are certainly more. Please suggest those that are missing. I’ve >> taken a rough cut at which APIs need forward/backward compatibility first, >> in part based on those that are the “most public” and most likely to >> change. Others are important, but we can’t do them all at once. >> >> Thanks, >> >> - Paul >> >> On Aug 29, 2017, at 6:00 PM, Aman Sinha <amansi...@apache.org<mailto:a >> mansi...@apache.org>> wrote: >> >> Hi Paul, >> certainly makes sense to have the API compatibility discussions during this >> hackathon. The 2.0 release may be a good checkpoint to introduce breaking >> changes necessitating changes to the ODBC/JDBC drivers and other external >> applications. As part of this exercise (not during the hackathon but as a >> follow-up action), we also should clearly identify the "public" interfaces. >> >> >> I will add this to the agenda. >> >> thanks, >> -Aman >> >> On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <prog...@mapr.com<mailto: >> prog...@mapr.com>> wrote: >> >> Thanks Aman for organizing the Hackathon! >> >> The list included many good ideas for Drill 2.0. Some of those require >> changes to Drill’s “public” interfaces (file format, client protocol, SQL >> behavior, etc.) >> >> At present, Drill has no good mechanism to handle backward/forward >> compatibility at the API level. Protobuf versioning certainly helps, but >> can’t completely solve semantic changes (where a field changes meaning, or >> a non-Protobuf data chunk changes format.) As just one concrete example, >> changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class >> names and data formats will change. >> >> Perhaps we can prioritize, for the proposed 2.0 release, a one-time set of >> breaking changes that introduce a versioning mechanism into our public >> APIs. Once these are in place, we can evolve the APIs in the future by >> following the newly-created versioning protocol. >> >> Without such a mechanism, we cannot support old & new clients in the same >> cluster. Nor can we support rolling upgrades. Of course, another solution >> is to get it right the second time, then freeze all APIs and agree to never >> again change them. Not sure we have sufficient access to a crystal ball to >> predict everything we’d ever need in our APIs, however... >> >> Thanks, >> >> - Paul >> >> On Aug 24, 2017, at 8:39 AM, Aman Sinha <amansi...@apache.org<mailto:a >> mansi...@apache.org>> wrote: >> >> Drill Developers, >> >> In order to kick-start the Drill 2.0 release discussions, I would like >> to >> propose a Drill 2.0 (design) hackathon (a.k.a Drill Developer Day ™ J ). >> >> As I mentioned in the hangout on Tuesday, MapR has offered to host it on >> Sept 18th at their offices at 350 Holger Way, San Jose. Hope that works >> for most of you! >> >> The goal is to get the community together for a day-long technical >> discussion on key topics in preparation for a Drill 2.0 release as well >> as >> potential improvements in upcoming 1.xx releases. Depending on the >> interest areas, we could form groups and have a volunteer lead each >> group. >> >> Based on prior discussions on the dev list, hangouts and existing JIRAs, >> there is already a substantial set of topics and I have summarized a few >> of >> them below. What other topics do folks want to talk about? Feel free >> to >> respond to this thread and I will create a google doc to consolidate. >> Understandably, the list would be long but we will use the hackathon to >> get >> a sense of a reasonable feature set for 1.xx and 2.0 releases. >> >> >> 1. Metadata management. >> >> 1a: Defining an abstraction layer for various types of metadata: views, >> schema, statistics, security >> >> 1b: Underlying storage for metadata: what are the options and their >> trade-offs? >> >> - Hive metastore >> >> - Parquet metadata cache (parquet specific) >> >> - An embedded DBMS >> >> - A distributed key-value store >> >> - Others.. >> >> >> >> 2. Drill integration with Apache Arrow >> >> 2a: Evaluate the choices and tradeoffs >> >> >> >> 3. Resource management >> >> 3a: Memory limits per query >> >> 3b: Spilling >> >> 3c: Resource management with Drill on Yarn/Mesos/Kubernetes >> >> 3d: Local vs. global resource management >> >> 3e: Aligning with admission control/queueing >> >> >> >> 4. TPC-DS coverage and related planner/operator enhancements >> >> 4a: Additional set operations: INTERSECT, EXCEPT >> >> 4b: GROUPING SETS, ROLLUP, CUBE support >> >> 4c: Handling inequality joins and cartesian joins of non-scalar inputs >> (via Nested Loop Join) >> >> 4d: Remaining gaps in correlated subquery >> >> 4e: Statistics: Number of Distinct Values, Histograms >> >> >> >> 5. Schema handling >> >> 5a: Creation, management of schema >> >> 5b: Handling schema changes in certain common cases >> >> 5c: Schema-awareness >> >> 5d: Others TBD >> >> >> >> 6. Concurrency >> >> 6a: What are the bottlenecks to achieving higher concurrency >> >> 6b: Ideas to address these..e.g async execution ? >> >> >> >> 7. Storage plugins, REST APIs related enhancements >> >> <Topics TBD> >> >> >> >> 8. Performance improvements >> >> 8a: Filter pushdown >> >> 8b: Vectorized Parquet reader >> >> 8c: Code-gen improvements >> >> 8d: Others TBD >> >> >> >> > >