Hi Vitalii, I added a comment to JIRA.
Best Regards, Alex On Wed, Sep 12, 2018 at 6:47 PM Vitalii Diravka <[email protected]> wrote: > Oleksandr, > > You couldn't connect to this hangout meeting. But you can share your ideas > in the answer to our last comment regarding Drill Metastore [1]. > Could you please take a look? > > [1] > > https://issues.apache.org/jira/browse/DRILL-6552?focusedCommentId=16612437&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16612437 > > Kind regards > Vitalii > > > On Wed, Aug 22, 2018 at 8:28 AM Hanumath Rao Maduri <[email protected]> > wrote: > > > Hangout attendees on 08/21: > > Pritesh, Salim, Hanumath, Boaz, Robert, Jyothsna, Karthik, Gautam, > Vitalli, > > Vova, Parth, Olek > > > > Vitalli and Vova gave a presentation on Drill Metadata management > project. > > > > Some of the questions which were discussed during the discussion. > > 1) Gautam suggested to use native operators for collecting stats instead > of > > aggregation operators. > > 2) The metadata API should be made abstract such that metastore can use a > > dfs or hive metastore etc. > > 3) Schema change exception can be minimized by hive metastore but not > > totally overcome. > > 4) Discussion on how to refresh the metadata. > > 5) Caching the metadata and discussion on what problems the eariler > caching > > solutions had in Drill. > > > > > > Further metadata discussion will be continued in the next hangout. > > > > -Hanu > > > > On Tue, Aug 21, 2018 at 9:53 AM Vitalii Diravka < > [email protected] > > > > > wrote: > > > > > Hi Alex, > > > > > > The issues pointed by you really exist. And using of HMS is still open > > > question. > > > > > > The main goal is to make Drill Metastore API, which can be used for > > > different Drill data sources. Then to adapt current Parquet metadata > > cache > > > files mechanism to this API. > > > It will be the first implementation. The second one could be HMS. > > > Although it has limitations, it has also benefits: it is easy to > leverage > > > it in Drill, a lot of projects already use HMS (Spark, Presto ...), > > > so for some users it can be a good choice for storing metadata. > > > > > > Other implementations for Drill Metastore could be discussed (MetaCat, > > > WhereHow, new own implementation based on HBase/MapR-DB). > > > > > > > > > Kind regards > > > Vitalii > > > > > > > > > On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <[email protected]> > > > wrote: > > > > > > > Hi Volodymyr, > > > > > > > > Just recalling on recent discussions in DEV list, it would be > > interesting > > > > to see if following topics are addressed in the Drill metadata > > management > > > > initiative: > > > > > > > > 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS) > > > > Just to substantiate this point of view from practical experience, > and > > if > > > > we reflect on ambition to integrate and operate Drill in > > mission-critical > > > > environment, following aspects could be listed: > > > > - Need of DBA support if cluster is subject to service level > > > > objectives/agreements, which is somehow remote from Hadoop world. > Need > > of > > > > strong DBA skills if resulting DB workload is challenging in terms of > > > > performance tuning. > > > > - Common RDBMS setups offer active-standby HA model. In secure > > > > environments, e.g. environments which are subject to PCI-DSS > > compliancy, > > > > that implies frequent OS patching and reboot (in reality every 30 > days > > > > max), thus causing an additional coordination effort and service > outage > > > for > > > > duration of the failovers. > > > > - Active-active HA clusters like Galera / Percona are free of above > > > > disadvantage, but require specific skill set which is not widespread > in > > > DBA > > > > community. Also they are sensitive to even disk IO performance across > > the > > > > cluster which may require additional hardware adjustment and IO > > > isolation. > > > > - Need of backup / restore mechanism, which is probably lesser of > > > > concerns > > > > > > > > 2. Bottleneck in foreman when performing initial metadata collection > > (and > > > > eventually pruning) on large amount of Parquet files > > > > - From discussion in the mailing list it was not fully clear > whether > > > > metastore will address it > > > > - Or shall this discussion be continued outside of metastore > > initiative > > > > from your point of view? > > > > > > > > I hope it would be OK with you and Vitalii to share some thoughts on > > > this. > > > > > > > > Thanks & Best Regards, > > > > Alex > > > > > > > > On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I and Vitalii Diravka want to give the presentation with our ideas > > > > > connected with Drill Metadata management project (DRILL-6552 > > > > > <https://issues.apache.org/jira/browse/DRILL-6552>). > > > > > > > > > > We will be happy to discuss it and choose the right way for further > > > > > development. > > > > > > > > > > Kind regards, > > > > > Volodymyr Vysotskyi > > > > > > > > > > > > > > > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri < > > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > The Apache Drill Hangout will be held tomorrow at 10:00am PST; > > please > > > > let > > > > > > us know should you have a topic for tomorrow's hangout. We will > > also > > > > ask > > > > > > for topics at the beginning of the hangout. > > > > > > > > > > > > Hangout Link - > > > > > > > > > > > > https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc > > > > > > > > > > > > Regards, > > > > > > Hanu > > > > > > > > > > > > > > > > > > > > >
