Thanks for the update.

Looks good to me!

Best,
Jingsong

On Wed, Jun 21, 2023 at 9:59 AM Shammon FY <[email protected]> wrote:
>
> Thanks Jingsong.
>
> As we discussed offline, the `metadata.store` will store the table lineage 
> and data lineage information, which is orthogonal with `metastore`. We can 
> introduce an option `lineage-meta` as follows.
>
> CREATE CATALOG paimon_catalog1 WITH (
>     ... // other options
>     'metastore' = 'hive',
>     'url' = 'XXXXX',
>     'lineage-meta' = 'jdbc',
>     'jdbc.driver' = 'com.mysql.jdbc.Driver',
>     'jdbc.database' = 'paimon_cata1',    // The default Lineage Meta Database 
> name is `paimon`
>     'jdbc.username' = 'XXX',
>     'jdbc.password' = 'XXX'
> );
>
> Then we can support `lineage-meta` for `filesystem` and `hive` metastore. I 
> have updated the PIP for the options and the interfaces.
>
>
> Best,
> Shammon FY
>
>
> On Tue, Jun 20, 2023 at 8:13 PM Jingsong Li <[email protected]> wrote:
>>
>> Thanks Shammon,
>>
>> For the metadata.store, is this just now the metastore?
>>
>> I mean can we manage this meta information through the current Catalog
>> interface (which is in fact metastore as a key)?
>>
>> For example,
>>
>> CREATE CATALOG paimon_catalog1 WITH (
>>     ... // other options
>>     'metastore' = 'jdbc',
>>     'url' = 'XXXXX',
>>     'jdbc.driver' = 'com.mysql.jdbc.Driver',
>>     'jdbc.database' = 'paimon_cata1',    // The default Metadata
>> Database name is `paimon`
>>     'jdbc.username' = 'XXX',
>>     'jdbc.password' = 'XXX'
>> );
>>
>> JDBC manages not only the table information (which is what Catalog
>> used to do), but also the data lineage information.
>>
>> What do you think?
>>
>> Or you still want to separate their responsibilities.
>>
>> Best,
>> Jingsong
>>
>> On Thu, Jun 15, 2023 at 1:46 PM Shammon FY <[email protected]> wrote:
>> >
>> > Hi Jingsong,
>> >
>> > I have updated this PIP and added the implementation for System Database, 
>> > the main changes are as follows:
>> >
>> > 1. Introduce MetadataStore and MetadataStoreFactory to store the data of 
>> > table and data lineages.
>> > 2. Use jdbc as default metadata store
>> > 3. Users can query table and data lineage tables, and delete lineages with 
>> > actions
>> >
>> > Looking forward to your feedback, thanks
>> >
>> > Best,
>> > Shammon FY
>> >
>> >
>> > On Wed, Jun 14, 2023 at 11:17 AM Shammon FY <[email protected]> wrote:
>> >>
>> >> Hi Jingsong,
>> >>
>> >> It's a good point about the detailed implementation of System Database, 
>> >> I'll update the PIP soon.
>> >>
>> >> Best,
>> >> Shammon FY
>> >>
>> >> On Wed, Jun 14, 2023 at 8:48 AM Shammon FY <[email protected]> wrote:
>> >>>
>> >>> Hi Jingsong,
>> >>>
>> >>> Thanks for your comments.
>> >>>
>> >>> > We should document what is based on FLIP-314.
>> >>>
>> >>> I have updated the operations supported by FLIP-314 in the future work
>> >>>
>> >>> > Is the current Source interface sufficient for your functionality?
>> >>>
>> >>> In our design the current Source interface fulfills our requirements. As 
>> >>> described in PIP-5, `AlignedEnumerator` will send checkpoint events to 
>> >>> `AlignedSourceReader`, which will align the checkpoint and snapshot, and 
>> >>> then send split the next operator. More detailed information can be 
>> >>> provided by @liming
>> >>>
>> >>> > Can we currently achieve the ability to flush all data in a snapshot 
>> >>> > before snapshot?
>> >>>
>> >>> Can you provide a more detailed description of this? Do you mean there 
>> >>> may be too much data for a snapshot if the source aligns the checkpoint 
>> >>> and snapshot and causes the snapshot to be too large to flush?
>> >>>
>> >>>
>> >>> Best,
>> >>> Shammon FY
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Jun 12, 2023 at 4:30 PM Jingsong Li <[email protected]> 
>> >>> wrote:
>> >>>>
>> >>>> System Database looks very good~ But perhaps there are some design
>> >>>> details here? What API should we use? Paimon Java API? And we should
>> >>>> commit every operation?
>> >>>>
>> >>>> Best,
>> >>>> Jingsong
>> >>>>
>> >>>> On Mon, Jun 12, 2023 at 4:27 PM Jingsong Li <[email protected]> 
>> >>>> wrote:
>> >>>> >
>> >>>> > Thanks Shammon,
>> >>>> >
>> >>>> > The overall design looks good to me!
>> >>>> >
>> >>>> > ## Plan For The Future
>> >>>> >
>> >>>> > We should document what is based on FLIP-314.
>> >>>> >
>> >>>> > ## AlignedEnumerator and AlignedSourceReader
>> >>>> >
>> >>>> > Is the current Source interface sufficient for your functionality?
>> >>>> >
>> >>>> > Can we currently achieve the ability to flush all data in a snapshot
>> >>>> > before snapshot?
>> >>>> >
>> >>>> > Best,
>> >>>> > Jingsong
>> >>>> >
>> >>>> > On Mon, Jun 5, 2023 at 7:57 PM Shammon FY <[email protected]> wrote:
>> >>>> > >
>> >>>> > > Hi Kelu,
>> >>>> > >
>> >>>> > > Thanks for your feedback. In the first stage, we do not want to 
>> >>>> > > introduce a
>> >>>> > > server, but instead store information directly in the Paimon table 
>> >>>> > > when
>> >>>> > > creating and running Flink jobs. A server will be considered when we
>> >>>> > > encounter more requirements in the future and need a resident 
>> >>>> > > service
>> >>>> > > management.
>> >>>> > >
>> >>>> > > Best,
>> >>>> > > Shammon FY
>> >>>> > >
>> >>>> > > On Fri, Jun 2, 2023 at 5:55 PM Kelu Tao <[email protected]> 
>> >>>> > > wrote:
>> >>>> > >
>> >>>> > > > +1
>> >>>> > > >
>> >>>> > > > cool job ~
>> >>>> > > >
>> >>>> > > > For this PIP, do we need to introduce a new server for the 
>> >>>> > > > information
>> >>>> > > > serving?
>> >>>> > > >
>> >>>> > > > On 2023/05/31 02:28:21 Shammon FY wrote:
>> >>>> > > > > Hi devs,
>> >>>> > > > >
>> >>>> > > > > We would like to start a discussion about PIP-5: Paimon Table 
>> >>>> > > > > And Data
>> >>>> > > > > Lineage For Flink[1].
>> >>>> > > > >
>> >>>> > > > > As a streaming lake, users can use Paimon integrated with Flink 
>> >>>> > > > > to
>> >>>> > > > complete
>> >>>> > > > > the entire ETL processing. In this process, users need to 
>> >>>> > > > > manage batch &
>> >>>> > > > > streaming jobs and data streams, including batch & streaming 
>> >>>> > > > > data
>> >>>> > > > > validation, job debug, and data revision. To support the above 
>> >>>> > > > > ability,
>> >>>> > > > we
>> >>>> > > > > introduce table and data lineage for Flink & Paimon. Users can
>> >>>> > > > conveniently
>> >>>> > > > > manage the entire ETL processing based on lineage information.
>> >>>> > > > >
>> >>>> > > > > Looking forward to hearing from you, thanks.
>> >>>> > > > >
>> >>>> > > > >
>> >>>> > > > > [1]
>> >>>> > > > >
>> >>>> > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-5%3A+Paimon+Table+And+Data+Lineage+For+Flink
>> >>>> > > > >
>> >>>> > > > >
>> >>>> > > > > Best,
>> >>>> > > > > Shammon FY
>> >>>> > > > >
>> >>>> > > >

Reply via email to