Hello All, Please find the presentations of Drill developer day at the link below. https://drive.google.com/drive/folders/1VjKkqCKghrrbmgAyDY7h2bhoF65QB57r
Thanks, On Wed, Nov 14, 2018 at 8:32 AM Hanumanth Maduri <[email protected]> wrote: > > Hello Drillers, > > Here is the webex link for remote attendees. > Remote attendees can join at > https://mapr.webex.com/mapr/j.phpMTID=ma05d8b5406acdb6292d5b81c79240a38 > > Thanks > > > > On Nov 2, 2018, at 11:25 AM, Abhishek Girish <[email protected]> wrote: > > > > Charles, I'm sure we'll have a link for remote folks to join - will share > > it closer to the day. > > > >> On Thu, Nov 1, 2018 at 1:58 PM hanu mapr <[email protected]> wrote: > >> > >> Hello All, > >> > >> There was typo for the year in the mail. It should be 2018 instead of > 2019. > >> Thanks Aman for correcting it. > >> > >> Regards, > >> -Hanu > >> > >>> On Thu, Nov 1, 2018 at 6:30 AM Charles Givre <[email protected]> wrote: > >>> > >>> Hi Hanumath, > >>> This looks great!! Will you be streaming the event for those of us not > >> in > >>> the Bay Area? > >>> Thx, > >>> — C > >>> > >>>> On Nov 1, 2018, at 00:10, Hanumath Rao Maduri <[email protected]> > >>> wrote: > >>>> > >>>> Drill Developers, > >>>> > >>>> > >>>> I am quite excited to announce the details of the Drill developers day > >>>> 2018. I have consolidated the topics from our earlier discussions and > >>>> prioritized them according to the votes. > >>>> > >>>> > >>>> MapR has offered to host it on Nov 14th in Training room downstairs. > >>>> > >>>> > >>>> Here is the exact location > >>>> > >>>> > >>>> Training Room at > >>>> > >>>> 4555 Great America Pkwy, Suite 201, Santa Clara, CA, 95054. > >>>> > >>>> > >>>> Please find the agenda for the meetup. > >>>> > >>>> > >>>> > >>>> *Lunch starts at 12:00PM.* > >>>> > >>>> > >>>> *[12:25 - 12:40] Welcome * > >>>> > >>>> - Recap on last year's activities > >>>> - Preview of this year's focus > >>>> > >>>> *[12:40 - 1:00] Storage plugins* > >>>> > >>>> > >>>> > >>>> - Adding new storage plugins for the following: > >>>> - Netflix Iceberg, Kudu(some code already exists), Cassandra, > >>>> Elasticsearch, Carbondata, ORC/XML file formats, Spark > >>>> RDD/DataFrames/Datasets, Graph databases & more > >>>> - Improving documentation related to Storage plugins > >>>> > >>>> > >>>> *[1:00 - 1:45] Schema discovery & Evolution* > >>>> > >>>> > >>>> > >>>> - Creation, management of schema > >>>> - Handling schema changes in certain common cases > >>>> - Handling NULL values elegantly > >>>> - Schema learning (similar to MSGpack plugin) > >>>> - Query hints > >>>> > >>>> *[1:45 - 2:30] Metadata Management* > >>>> > >>>> > >>>> > >>>> - Defining an abstraction layer for various types of metadata: views, > >>>> schema, statistics, security > >>>> - Underlying storage for metadata: what are the options and their > >>>> trade-offs? > >>>> - Hive metastore > >>>> - Parquet metadata cache (parquet specific for row group metadata) > >>>> - Ease of using the parquet files generated by other engines (like > >>> spark) > >>>> > >>>> > >>>> *[2:30 - 2:45] Break* > >>>> > >>>> > >>>> *[2:45 - 4:00] Resource management* > >>>> > >>>> > >>>> > >>>> - Resource limits per query > >>>> - Optimal memory assignment for blocking operators based on stats > >>>> - Enhancing the blocking and exchange operators to live within memory > >>>> limits > >>>> - Aligning with admission control/queueing (YARN concepts) > >>>> - Query scheduling based on queues using tagging and costing > >>>> - Drill on kubernetes > >>>> > >>>> > >>>> *[4:00 - 4:20] Apache Arrow* > >>>> > >>>> - Benefits of integrating Apache Drill with Apache Arrow > >>>> - Possible trade-offs & implementation hurdles > >>>> > >>>> *[4:20 - 4:40] **Performance Improvements* > >>>> > >>>> - Efficient handling of Broadcast/Semi/Anti Semi join > >>>> - Drill Statistics handling > >>>> - Optimizing complex Parquet reader > >>>> > >>>> Thanks, > >>>> -Hanu > >>> > >>> > >> >
