Thanks for the update. Looks good to me!
Best, Jingsong On Wed, Jun 21, 2023 at 9:59 AM Shammon FY <[email protected]> wrote: > > Thanks Jingsong. > > As we discussed offline, the `metadata.store` will store the table lineage > and data lineage information, which is orthogonal with `metastore`. We can > introduce an option `lineage-meta` as follows. > > CREATE CATALOG paimon_catalog1 WITH ( > ... // other options > 'metastore' = 'hive', > 'url' = 'XXXXX', > 'lineage-meta' = 'jdbc', > 'jdbc.driver' = 'com.mysql.jdbc.Driver', > 'jdbc.database' = 'paimon_cata1', // The default Lineage Meta Database > name is `paimon` > 'jdbc.username' = 'XXX', > 'jdbc.password' = 'XXX' > ); > > Then we can support `lineage-meta` for `filesystem` and `hive` metastore. I > have updated the PIP for the options and the interfaces. > > > Best, > Shammon FY > > > On Tue, Jun 20, 2023 at 8:13 PM Jingsong Li <[email protected]> wrote: >> >> Thanks Shammon, >> >> For the metadata.store, is this just now the metastore? >> >> I mean can we manage this meta information through the current Catalog >> interface (which is in fact metastore as a key)? >> >> For example, >> >> CREATE CATALOG paimon_catalog1 WITH ( >> ... // other options >> 'metastore' = 'jdbc', >> 'url' = 'XXXXX', >> 'jdbc.driver' = 'com.mysql.jdbc.Driver', >> 'jdbc.database' = 'paimon_cata1', // The default Metadata >> Database name is `paimon` >> 'jdbc.username' = 'XXX', >> 'jdbc.password' = 'XXX' >> ); >> >> JDBC manages not only the table information (which is what Catalog >> used to do), but also the data lineage information. >> >> What do you think? >> >> Or you still want to separate their responsibilities. >> >> Best, >> Jingsong >> >> On Thu, Jun 15, 2023 at 1:46 PM Shammon FY <[email protected]> wrote: >> > >> > Hi Jingsong, >> > >> > I have updated this PIP and added the implementation for System Database, >> > the main changes are as follows: >> > >> > 1. Introduce MetadataStore and MetadataStoreFactory to store the data of >> > table and data lineages. >> > 2. Use jdbc as default metadata store >> > 3. Users can query table and data lineage tables, and delete lineages with >> > actions >> > >> > Looking forward to your feedback, thanks >> > >> > Best, >> > Shammon FY >> > >> > >> > On Wed, Jun 14, 2023 at 11:17 AM Shammon FY <[email protected]> wrote: >> >> >> >> Hi Jingsong, >> >> >> >> It's a good point about the detailed implementation of System Database, >> >> I'll update the PIP soon. >> >> >> >> Best, >> >> Shammon FY >> >> >> >> On Wed, Jun 14, 2023 at 8:48 AM Shammon FY <[email protected]> wrote: >> >>> >> >>> Hi Jingsong, >> >>> >> >>> Thanks for your comments. >> >>> >> >>> > We should document what is based on FLIP-314. >> >>> >> >>> I have updated the operations supported by FLIP-314 in the future work >> >>> >> >>> > Is the current Source interface sufficient for your functionality? >> >>> >> >>> In our design the current Source interface fulfills our requirements. As >> >>> described in PIP-5, `AlignedEnumerator` will send checkpoint events to >> >>> `AlignedSourceReader`, which will align the checkpoint and snapshot, and >> >>> then send split the next operator. More detailed information can be >> >>> provided by @liming >> >>> >> >>> > Can we currently achieve the ability to flush all data in a snapshot >> >>> > before snapshot? >> >>> >> >>> Can you provide a more detailed description of this? Do you mean there >> >>> may be too much data for a snapshot if the source aligns the checkpoint >> >>> and snapshot and causes the snapshot to be too large to flush? >> >>> >> >>> >> >>> Best, >> >>> Shammon FY >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Mon, Jun 12, 2023 at 4:30 PM Jingsong Li <[email protected]> >> >>> wrote: >> >>>> >> >>>> System Database looks very good~ But perhaps there are some design >> >>>> details here? What API should we use? Paimon Java API? And we should >> >>>> commit every operation? >> >>>> >> >>>> Best, >> >>>> Jingsong >> >>>> >> >>>> On Mon, Jun 12, 2023 at 4:27 PM Jingsong Li <[email protected]> >> >>>> wrote: >> >>>> > >> >>>> > Thanks Shammon, >> >>>> > >> >>>> > The overall design looks good to me! >> >>>> > >> >>>> > ## Plan For The Future >> >>>> > >> >>>> > We should document what is based on FLIP-314. >> >>>> > >> >>>> > ## AlignedEnumerator and AlignedSourceReader >> >>>> > >> >>>> > Is the current Source interface sufficient for your functionality? >> >>>> > >> >>>> > Can we currently achieve the ability to flush all data in a snapshot >> >>>> > before snapshot? >> >>>> > >> >>>> > Best, >> >>>> > Jingsong >> >>>> > >> >>>> > On Mon, Jun 5, 2023 at 7:57 PM Shammon FY <[email protected]> wrote: >> >>>> > > >> >>>> > > Hi Kelu, >> >>>> > > >> >>>> > > Thanks for your feedback. In the first stage, we do not want to >> >>>> > > introduce a >> >>>> > > server, but instead store information directly in the Paimon table >> >>>> > > when >> >>>> > > creating and running Flink jobs. A server will be considered when we >> >>>> > > encounter more requirements in the future and need a resident >> >>>> > > service >> >>>> > > management. >> >>>> > > >> >>>> > > Best, >> >>>> > > Shammon FY >> >>>> > > >> >>>> > > On Fri, Jun 2, 2023 at 5:55 PM Kelu Tao <[email protected]> >> >>>> > > wrote: >> >>>> > > >> >>>> > > > +1 >> >>>> > > > >> >>>> > > > cool job ~ >> >>>> > > > >> >>>> > > > For this PIP, do we need to introduce a new server for the >> >>>> > > > information >> >>>> > > > serving? >> >>>> > > > >> >>>> > > > On 2023/05/31 02:28:21 Shammon FY wrote: >> >>>> > > > > Hi devs, >> >>>> > > > > >> >>>> > > > > We would like to start a discussion about PIP-5: Paimon Table >> >>>> > > > > And Data >> >>>> > > > > Lineage For Flink[1]. >> >>>> > > > > >> >>>> > > > > As a streaming lake, users can use Paimon integrated with Flink >> >>>> > > > > to >> >>>> > > > complete >> >>>> > > > > the entire ETL processing. In this process, users need to >> >>>> > > > > manage batch & >> >>>> > > > > streaming jobs and data streams, including batch & streaming >> >>>> > > > > data >> >>>> > > > > validation, job debug, and data revision. To support the above >> >>>> > > > > ability, >> >>>> > > > we >> >>>> > > > > introduce table and data lineage for Flink & Paimon. Users can >> >>>> > > > conveniently >> >>>> > > > > manage the entire ETL processing based on lineage information. >> >>>> > > > > >> >>>> > > > > Looking forward to hearing from you, thanks. >> >>>> > > > > >> >>>> > > > > >> >>>> > > > > [1] >> >>>> > > > > >> >>>> > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-5%3A+Paimon+Table+And+Data+Lineage+For+Flink >> >>>> > > > > >> >>>> > > > > >> >>>> > > > > Best, >> >>>> > > > > Shammon FY >> >>>> > > > > >> >>>> > > >
