Re: Drill Hangout tomorrow 08/21

Oleksandr Kalinin Wed, 19 Sep 2018 08:20:07 -0700

Hi Vitalii,

I added a comment to JIRA.


Best Regards,
Alex

On Wed, Sep 12, 2018 at 6:47 PM Vitalii Diravka <[email protected]>
wrote:

> Oleksandr,
>
> You couldn't connect to this hangout meeting. But you can share your ideas
> in the answer to our last comment regarding Drill Metastore [1].
> Could you please take a look?
>
> [1]
>
> https://issues.apache.org/jira/browse/DRILL-6552?focusedCommentId=16612437&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16612437
>
> Kind regards
> Vitalii
>
>
> On Wed, Aug 22, 2018 at 8:28 AM Hanumath Rao Maduri <[email protected]>
> wrote:
>
> > Hangout attendees on 08/21:
> > Pritesh, Salim, Hanumath, Boaz, Robert, Jyothsna, Karthik, Gautam,
> Vitalli,
> > Vova, Parth, Olek
> >
> > Vitalli and Vova gave a presentation on Drill Metadata management
> project.
> >
> > Some of the questions which were discussed during the discussion.
> > 1) Gautam suggested to use native operators for collecting stats instead
> of
> > aggregation operators.
> > 2) The metadata API should be made abstract such that metastore can use a
> > dfs or hive metastore etc.
> > 3) Schema change exception can be minimized by hive metastore but not
> > totally overcome.
> > 4) Discussion on how to refresh the metadata.
> > 5) Caching the metadata and discussion on what problems the eariler
> caching
> > solutions had in Drill.
> >
> >
> > Further metadata discussion will be continued in the next hangout.
> >
> > -Hanu
> >
> > On Tue, Aug 21, 2018 at 9:53 AM Vitalii Diravka <
> [email protected]
> > >
> > wrote:
> >
> > > Hi Alex,
> > >
> > > The issues pointed by you really exist. And using of HMS is still open
> > > question.
> > >
> > > The main goal is to make Drill Metastore API, which can be used for
> > > different Drill data sources. Then to adapt current Parquet metadata
> > cache
> > > files mechanism to this API.
> > > It will be the first implementation. The second one could be HMS.
> > > Although it has limitations, it has also benefits: it is easy to
> leverage
> > > it in Drill, a lot of projects already use HMS (Spark, Presto ...),
> > > so for some users it can be a good choice for storing metadata.
> > >
> > > Other implementations for Drill Metastore could be discussed (MetaCat,
> > > WhereHow, new own implementation based on HBase/MapR-DB).
> > >
> > >
> > > Kind regards
> > > Vitalii
> > >
> > >
> > > On Tue, Aug 21, 2018 at 7:04 PM Oleksandr Kalinin <[email protected]>
> > > wrote:
> > >
> > > > Hi Volodymyr,
> > > >
> > > > Just recalling on recent discussions in DEV list, it would be
> > interesting
> > > > to see if following topics are addressed in the Drill metadata
> > management
> > > > initiative:
> > > >
> > > > 1. Avoiding repetition of Hive mistakes (mainly relying on RDBMS)
> > > > Just to substantiate this point of view from practical experience,
> and
> > if
> > > > we reflect on ambition to integrate and operate Drill in
> > mission-critical
> > > > environment, following aspects could be listed:
> > > >   - Need of DBA support if cluster is subject to service level
> > > > objectives/agreements, which is somehow remote from Hadoop world.
> Need
> > of
> > > > strong DBA skills if resulting DB workload is challenging in terms of
> > > > performance tuning.
> > > >   - Common RDBMS setups offer active-standby HA model. In secure
> > > > environments, e.g. environments which are subject to PCI-DSS
> > compliancy,
> > > > that implies frequent OS patching and reboot (in reality every 30
> days
> > > > max), thus causing an additional coordination effort and service
> outage
> > > for
> > > > duration of the failovers.
> > > >   - Active-active HA clusters like Galera / Percona are free of above
> > > > disadvantage, but require specific skill set which is not widespread
> in
> > > DBA
> > > > community. Also they are sensitive to even disk IO performance across
> > the
> > > > cluster which may require additional hardware adjustment and IO
> > > isolation.
> > > >   - Need of backup / restore mechanism, which is probably lesser of
> > > > concerns
> > > >
> > > > 2. Bottleneck in foreman when performing initial metadata collection
> > (and
> > > > eventually pruning) on large amount of Parquet files
> > > >   - From discussion in the mailing list it was not fully clear
> whether
> > > > metastore will address it
> > > >   - Or shall this discussion be continued outside of metastore
> > initiative
> > > > from your point of view?
> > > >
> > > > I hope it would be OK with you and Vitalii to share some thoughts on
> > > this.
> > > >
> > > > Thanks & Best Regards,
> > > > Alex
> > > >
> > > > On Mon, Aug 20, 2018 at 10:50 PM Volodymyr Vysotskyi <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I and Vitalii Diravka want to give the presentation with our ideas
> > > > > connected with Drill Metadata management project (DRILL-6552
> > > > > <https://issues.apache.org/jira/browse/DRILL-6552>).
> > > > >
> > > > > We will be happy to discuss it and choose the right way for further
> > > > > development.
> > > > >
> > > > > Kind regards,
> > > > > Volodymyr Vysotskyi
> > > > >
> > > > >
> > > > > On Mon, Aug 20, 2018 at 10:35 PM Hanumath Rao Maduri <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > The Apache Drill Hangout will be held tomorrow at 10:00am PST;
> > please
> > > > let
> > > > > > us know should you have a topic for tomorrow's hangout. We will
> > also
> > > > ask
> > > > > > for topics at the beginning of the hangout.
> > > > > >
> > > > > > Hangout Link -
> > > > > >
> > > >
> > https://hangouts.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
> > > > > >
> > > > > > Regards,
> > > > > > Hanu
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill Hangout tomorrow 08/21

Reply via email to