Thanks Jingsong.

As we discussed offline, the `metadata.store` will store the table lineage
and data lineage information, which is orthogonal with `metastore`. We can
introduce an option `lineage-meta` as follows.

CREATE CATALOG paimon_catalog1 WITH (
    ... // other options
    'metastore' = 'hive',
    'url' = 'XXXXX',
    'lineage-meta' = 'jdbc',
    'jdbc.driver' = 'com.mysql.jdbc.Driver',
    'jdbc.database' = 'paimon_cata1',    // The default Lineage Meta
Database name is `paimon`
    'jdbc.username' = 'XXX',
    'jdbc.password' = 'XXX'
);

Then we can support `lineage-meta` for `filesystem` and `hive` metastore. I
have updated the PIP for the options and the interfaces.


Best,
Shammon FY


On Tue, Jun 20, 2023 at 8:13 PM Jingsong Li <[email protected]> wrote:

> Thanks Shammon,
>
> For the metadata.store, is this just now the metastore?
>
> I mean can we manage this meta information through the current Catalog
> interface (which is in fact metastore as a key)?
>
> For example,
>
> CREATE CATALOG paimon_catalog1 WITH (
>     ... // other options
>     'metastore' = 'jdbc',
>     'url' = 'XXXXX',
>     'jdbc.driver' = 'com.mysql.jdbc.Driver',
>     'jdbc.database' = 'paimon_cata1',    // The default Metadata
> Database name is `paimon`
>     'jdbc.username' = 'XXX',
>     'jdbc.password' = 'XXX'
> );
>
> JDBC manages not only the table information (which is what Catalog
> used to do), but also the data lineage information.
>
> What do you think?
>
> Or you still want to separate their responsibilities.
>
> Best,
> Jingsong
>
> On Thu, Jun 15, 2023 at 1:46 PM Shammon FY <[email protected]> wrote:
> >
> > Hi Jingsong,
> >
> > I have updated this PIP and added the implementation for System
> Database, the main changes are as follows:
> >
> > 1. Introduce MetadataStore and MetadataStoreFactory to store the data of
> table and data lineages.
> > 2. Use jdbc as default metadata store
> > 3. Users can query table and data lineage tables, and delete lineages
> with actions
> >
> > Looking forward to your feedback, thanks
> >
> > Best,
> > Shammon FY
> >
> >
> > On Wed, Jun 14, 2023 at 11:17 AM Shammon FY <[email protected]> wrote:
> >>
> >> Hi Jingsong,
> >>
> >> It's a good point about the detailed implementation of System Database,
> I'll update the PIP soon.
> >>
> >> Best,
> >> Shammon FY
> >>
> >> On Wed, Jun 14, 2023 at 8:48 AM Shammon FY <[email protected]> wrote:
> >>>
> >>> Hi Jingsong,
> >>>
> >>> Thanks for your comments.
> >>>
> >>> > We should document what is based on FLIP-314.
> >>>
> >>> I have updated the operations supported by FLIP-314 in the future work
> >>>
> >>> > Is the current Source interface sufficient for your functionality?
> >>>
> >>> In our design the current Source interface fulfills our requirements.
> As described in PIP-5, `AlignedEnumerator` will send checkpoint events to
> `AlignedSourceReader`, which will align the checkpoint and snapshot, and
> then send split the next operator. More detailed information can be
> provided by @liming
> >>>
> >>> > Can we currently achieve the ability to flush all data in a snapshot
> before snapshot?
> >>>
> >>> Can you provide a more detailed description of this? Do you mean there
> may be too much data for a snapshot if the source aligns the checkpoint and
> snapshot and causes the snapshot to be too large to flush?
> >>>
> >>>
> >>> Best,
> >>> Shammon FY
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Jun 12, 2023 at 4:30 PM Jingsong Li <[email protected]>
> wrote:
> >>>>
> >>>> System Database looks very good~ But perhaps there are some design
> >>>> details here? What API should we use? Paimon Java API? And we should
> >>>> commit every operation?
> >>>>
> >>>> Best,
> >>>> Jingsong
> >>>>
> >>>> On Mon, Jun 12, 2023 at 4:27 PM Jingsong Li <[email protected]>
> wrote:
> >>>> >
> >>>> > Thanks Shammon,
> >>>> >
> >>>> > The overall design looks good to me!
> >>>> >
> >>>> > ## Plan For The Future
> >>>> >
> >>>> > We should document what is based on FLIP-314.
> >>>> >
> >>>> > ## AlignedEnumerator and AlignedSourceReader
> >>>> >
> >>>> > Is the current Source interface sufficient for your functionality?
> >>>> >
> >>>> > Can we currently achieve the ability to flush all data in a snapshot
> >>>> > before snapshot?
> >>>> >
> >>>> > Best,
> >>>> > Jingsong
> >>>> >
> >>>> > On Mon, Jun 5, 2023 at 7:57 PM Shammon FY <[email protected]>
> wrote:
> >>>> > >
> >>>> > > Hi Kelu,
> >>>> > >
> >>>> > > Thanks for your feedback. In the first stage, we do not want to
> introduce a
> >>>> > > server, but instead store information directly in the Paimon
> table when
> >>>> > > creating and running Flink jobs. A server will be considered when
> we
> >>>> > > encounter more requirements in the future and need a resident
> service
> >>>> > > management.
> >>>> > >
> >>>> > > Best,
> >>>> > > Shammon FY
> >>>> > >
> >>>> > > On Fri, Jun 2, 2023 at 5:55 PM Kelu Tao <[email protected]>
> wrote:
> >>>> > >
> >>>> > > > +1
> >>>> > > >
> >>>> > > > cool job ~
> >>>> > > >
> >>>> > > > For this PIP, do we need to introduce a new server for the
> information
> >>>> > > > serving?
> >>>> > > >
> >>>> > > > On 2023/05/31 02:28:21 Shammon FY wrote:
> >>>> > > > > Hi devs,
> >>>> > > > >
> >>>> > > > > We would like to start a discussion about PIP-5: Paimon Table
> And Data
> >>>> > > > > Lineage For Flink[1].
> >>>> > > > >
> >>>> > > > > As a streaming lake, users can use Paimon integrated with
> Flink to
> >>>> > > > complete
> >>>> > > > > the entire ETL processing. In this process, users need to
> manage batch &
> >>>> > > > > streaming jobs and data streams, including batch & streaming
> data
> >>>> > > > > validation, job debug, and data revision. To support the
> above ability,
> >>>> > > > we
> >>>> > > > > introduce table and data lineage for Flink & Paimon. Users can
> >>>> > > > conveniently
> >>>> > > > > manage the entire ETL processing based on lineage information.
> >>>> > > > >
> >>>> > > > > Looking forward to hearing from you, thanks.
> >>>> > > > >
> >>>> > > > >
> >>>> > > > > [1]
> >>>> > > > >
> >>>> > > >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-5%3A+Paimon+Table+And+Data+Lineage+For+Flink
> >>>> > > > >
> >>>> > > > >
> >>>> > > > > Best,
> >>>> > > > > Shammon FY
> >>>> > > > >
> >>>> > > >
>

Reply via email to