Hi Shammon FY,

I get your point, but the role of a Listener is more towards
notification. For example, as you mentioned, we can query the relevant
information through APIs for DDL and commit information. However, when
we want to know if there have been any changes to the relevant
information, we need to constantly poll the tables. This mechanism can
be resource-intensive, especially when there are many tables. With a
Listener, we can promptly detect changes in status. Consider a
separate Table service that has a requirement to compact all tables,
and the compact parameters are stored in the options. When there is a
change in the options of a table, the Table Service needs to be
notified promptly to determine whether to immediately compact the
table. When there is new data committed to a table, it needs to be
promptly detected to determine whether to compact it. Also, users need
the assistance of CommitEvent to trigger downstream tasks based on the
watermark of a table.
Querying compact information through SQL or APIs is indeed a good way.
It is relatively simple to query historical compact records. However,
if you want to know the current compact status of a table, using a
Listener may be simpler.

Best
Shidayang

Shammon FY <[email protected]> 于2023年8月21日周一 23:24写道:
>
> Hi Jocean,
>
> Thanks for your explanation. I still have some issues
>
> 1. What are the ddl events for Paimon used for? If you need to show tables
> for paimon in your system, I think it's better to define table related
> interfaces, and then you can implement them for Paimon, Iceberg and Hudi
> instead of adding a ddl listener in them. It's more general and you can
> even manage other tables such as databases, mongodb and hive.
>
> 2. If some system information in `CompactEvent` is currently missing or
> there's no information about `compact`,  I think a better way is to add
> this system information in Paimon, rather than adding a listener and
> creating an event with the information. Then the external system can get
> the information by SQL or API directly, this is a more reasonable approach.
>
> 3. Also what is the `CommitEvent` used for? Currently we have metrics for
> `Commit` and jobs can report them. How about adding a customized reporter
> for metrics instead of a listener for `CommitEvent`?
>
> Best,
> Shammon FY
>
>
>
>
> On Mon, Aug 21, 2023 at 5:16 PM Jocean shi <[email protected]> wrote:
>
> > Hi Shammon FY,
> >
> > Thanks for your comments. I’d like to share my thoughts about your
> > comments.
> >
> > 1. Public Interface
> > Thank you for the reminder. I overlooked the correspondence between
> > the Public Interface of PIP and the "@Public" annotation.
> > My idea was that Event, Listener, and ListenerFactory are public,
> > while the others are non-public.
> >
> > 2.  Add `Factory` to create `Listener`
> > Great suggestion, I have already added the ListenerFactory to PIP.
> >
> > 3. Flink and Spark support meta-data listeners
> > It will be very inconvenient for users to obtain DDL information
> > through engines. Firstly, there are many implementations of various
> > engines that need to be connected. Secondly, in addition to Flink and
> > Spark, many engines do not support meta-data listeners. As a general
> > data lake, Paimon should have its own mechanism for meta-data
> > listeners.
> >
> > 4. report events such as commit/compact to an external system
> > CompactEvent: Currently, the compact state is a black box, and users
> > cannot obtain the information through SQL or API.
> > CommitEvent: Currently, the methods of querying through SQL or API are
> > based on polling, which makes it difficult for users to perceive
> > commit operations in a timely manner and consumes a lot of resources.
> >
> > Best
> > Shidayang
> >
> > Shammon FY <[email protected]> 于2023年8月18日周五 14:07写道:
> > >
> > > Thanks @Jocean for starting this discussion, I have some comments
> > >
> > > 1. About the public interfaces in the PIP, we should add @Public for them
> > > such as `Event`, `Listener` and even `CommitEvent` and other events. But
> > > for `Listeners`, I don't think it should be a public interface. All
> > fields
> > > in the public interface for users should be `Public` too, but I found the
> > > information such as `ManifestEntry` in `CommitEvent` is not a public
> > > interface. I think you may need to reconsider which interfaces need to be
> > > marked with @Public and which are not.
> > >
> > > 2. In general, it is better to give a `Factory` to create `Listener`
> > which
> > > should be all marked as `@Public` and you can see
> > > `CatalogFactory`->`Catalog` as an example.
> > >
> > > 3. Currently Flink and Spark support meta-data listeners and we can
> > support
> > > reporting ddl information there, should we need to add the same listener
> > in
> > > Paimon?
> > >
> > > 4. Should we need to report the events such as commit/compact to an
> > > external system? Currently we have some system tables and users can get
> > > these information by SQL or API, should the external system query these
> > > information regularly instead of a listener to push them?
> > >
> > > Best,
> > > Shammon FY
> > >
> > >
> > > On Tue, Aug 15, 2023 at 11:08 AM Jocean shi <[email protected]>
> > wrote:
> > >
> > > > Hi devs:
> > > >
> > > > We would like to start a discussion about PIP-8: Introduce listeners
> > > > for Paimon[1].
> > > >
> > > > In production environments, users often need to perceive the state
> > > > changes of Paimon table,
> > > > such as whether a new file has been committed to the table, in which
> > > > partitions the committed files are,
> > > > the size and number of the committed files, the status and type of
> > > > compaction, operations like table creation, deletion, and schema
> > > > changes, etc.
> > > > So, we introduce a Listener system for Paimon.
> > > > Looking forward to hearing from you.
> > > >
> > > > [1]
> > > >
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-8%3A+Introduce+listeners+for+Paimon
> > > >
> > > > Best
> > > > shidayang
> > > >
> >

Reply via email to