Hi Shammon FY

Thanks for your comment.

1. DDL events
Many behaviors of the Table service are related to the options of
tables, such as whether the table has enabled full-compaction and the
triggering conditions for compaction. If the options of a table are
changed, the Table service needs to perceive it in a timely manner and
make corresponding adjustments to the behavior of the table. Without a
listener mechanism, the Table service needs to constantly poll the
table to determine if its configuration has changed, which increases
the pressure on Hive and the Table service. If we can listen to the
AlterTableEvent, we won't need to poll the options of the table.

2. Why not metric
Metric is mainly processed statistical indicators that are usually
measured at regular intervals, and multiple reported values may be the
same. This is quite different from events. For example, for 'commit',
Metric usually measures the size, quantity, and duration of recently
committed files, and the results obtained from multiple retrievals may
be the same. It can be imagined that replacing the currently existing
CommitCallback with Metric would be very troublesome.

Best
shidayang

Shammon FY <[email protected]> 于2023年8月23日周三 10:53写道:
>
> Hi Jocean
>
> Thanks for your answer. I think there are two types of the information you
> want to report: the ddl events and the runtime events such as commit,
> compaction.
>
> For the ddl events, I don't quite understand why you need to poll the table
> information regularly? As we all know that Paimon is really a storage which
> has all meta information in it, and even when you poll the information from
> Paimon, you need to store it somewhere. I think you can just use Paimon as
> the storage itself. If the performance of obtaining Paimon tables is
> relatively low, such as the large number of tables you mentioned, I think
> we should improve this, for example, add a table cache?
>
> For the runtime events, I understand that they are indeed necessary to
> report to a system like `Table Service`. But my issue is: can we do this in
> the existing metrics mechanism? For example, reporting relevant metrics to
> the `Table Service` instead of adding a new `listener`? If the metrics
> information is not complete enough, we can continue to add information in
> it.
>
> Best,
> Shammon FY
>
> On Tue, Aug 22, 2023 at 2:20 PM Jocean shi <[email protected]> wrote:
>
> > Hi Shammon FY,
> >
> > I get your point, but the role of a Listener is more towards
> > notification. For example, as you mentioned, we can query the relevant
> > information through APIs for DDL and commit information. However, when
> > we want to know if there have been any changes to the relevant
> > information, we need to constantly poll the tables. This mechanism can
> > be resource-intensive, especially when there are many tables. With a
> > Listener, we can promptly detect changes in status. Consider a
> > separate Table service that has a requirement to compact all tables,
> > and the compact parameters are stored in the options. When there is a
> > change in the options of a table, the Table Service needs to be
> > notified promptly to determine whether to immediately compact the
> > table. When there is new data committed to a table, it needs to be
> > promptly detected to determine whether to compact it. Also, users need
> > the assistance of CommitEvent to trigger downstream tasks based on the
> > watermark of a table.
> > Querying compact information through SQL or APIs is indeed a good way.
> > It is relatively simple to query historical compact records. However,
> > if you want to know the current compact status of a table, using a
> > Listener may be simpler.
> >
> > Best
> > Shidayang
> >
> > Shammon FY <[email protected]> 于2023年8月21日周一 23:24写道:
> > >
> > > Hi Jocean,
> > >
> > > Thanks for your explanation. I still have some issues
> > >
> > > 1. What are the ddl events for Paimon used for? If you need to show
> > tables
> > > for paimon in your system, I think it's better to define table related
> > > interfaces, and then you can implement them for Paimon, Iceberg and Hudi
> > > instead of adding a ddl listener in them. It's more general and you can
> > > even manage other tables such as databases, mongodb and hive.
> > >
> > > 2. If some system information in `CompactEvent` is currently missing or
> > > there's no information about `compact`,  I think a better way is to add
> > > this system information in Paimon, rather than adding a listener and
> > > creating an event with the information. Then the external system can get
> > > the information by SQL or API directly, this is a more reasonable
> > approach.
> > >
> > > 3. Also what is the `CommitEvent` used for? Currently we have metrics for
> > > `Commit` and jobs can report them. How about adding a customized reporter
> > > for metrics instead of a listener for `CommitEvent`?
> > >
> > > Best,
> > > Shammon FY
> > >
> > >
> > >
> > >
> > > On Mon, Aug 21, 2023 at 5:16 PM Jocean shi <[email protected]> wrote:
> > >
> > > > Hi Shammon FY,
> > > >
> > > > Thanks for your comments. I’d like to share my thoughts about your
> > > > comments.
> > > >
> > > > 1. Public Interface
> > > > Thank you for the reminder. I overlooked the correspondence between
> > > > the Public Interface of PIP and the "@Public" annotation.
> > > > My idea was that Event, Listener, and ListenerFactory are public,
> > > > while the others are non-public.
> > > >
> > > > 2.  Add `Factory` to create `Listener`
> > > > Great suggestion, I have already added the ListenerFactory to PIP.
> > > >
> > > > 3. Flink and Spark support meta-data listeners
> > > > It will be very inconvenient for users to obtain DDL information
> > > > through engines. Firstly, there are many implementations of various
> > > > engines that need to be connected. Secondly, in addition to Flink and
> > > > Spark, many engines do not support meta-data listeners. As a general
> > > > data lake, Paimon should have its own mechanism for meta-data
> > > > listeners.
> > > >
> > > > 4. report events such as commit/compact to an external system
> > > > CompactEvent: Currently, the compact state is a black box, and users
> > > > cannot obtain the information through SQL or API.
> > > > CommitEvent: Currently, the methods of querying through SQL or API are
> > > > based on polling, which makes it difficult for users to perceive
> > > > commit operations in a timely manner and consumes a lot of resources.
> > > >
> > > > Best
> > > > Shidayang
> > > >
> > > > Shammon FY <[email protected]> 于2023年8月18日周五 14:07写道:
> > > > >
> > > > > Thanks @Jocean for starting this discussion, I have some comments
> > > > >
> > > > > 1. About the public interfaces in the PIP, we should add @Public for
> > them
> > > > > such as `Event`, `Listener` and even `CommitEvent` and other events.
> > But
> > > > > for `Listeners`, I don't think it should be a public interface. All
> > > > fields
> > > > > in the public interface for users should be `Public` too, but I
> > found the
> > > > > information such as `ManifestEntry` in `CommitEvent` is not a public
> > > > > interface. I think you may need to reconsider which interfaces need
> > to be
> > > > > marked with @Public and which are not.
> > > > >
> > > > > 2. In general, it is better to give a `Factory` to create `Listener`
> > > > which
> > > > > should be all marked as `@Public` and you can see
> > > > > `CatalogFactory`->`Catalog` as an example.
> > > > >
> > > > > 3. Currently Flink and Spark support meta-data listeners and we can
> > > > support
> > > > > reporting ddl information there, should we need to add the same
> > listener
> > > > in
> > > > > Paimon?
> > > > >
> > > > > 4. Should we need to report the events such as commit/compact to an
> > > > > external system? Currently we have some system tables and users can
> > get
> > > > > these information by SQL or API, should the external system query
> > these
> > > > > information regularly instead of a listener to push them?
> > > > >
> > > > > Best,
> > > > > Shammon FY
> > > > >
> > > > >
> > > > > On Tue, Aug 15, 2023 at 11:08 AM Jocean shi <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi devs:
> > > > > >
> > > > > > We would like to start a discussion about PIP-8: Introduce
> > listeners
> > > > > > for Paimon[1].
> > > > > >
> > > > > > In production environments, users often need to perceive the state
> > > > > > changes of Paimon table,
> > > > > > such as whether a new file has been committed to the table, in
> > which
> > > > > > partitions the committed files are,
> > > > > > the size and number of the committed files, the status and type of
> > > > > > compaction, operations like table creation, deletion, and schema
> > > > > > changes, etc.
> > > > > > So, we introduce a Listener system for Paimon.
> > > > > > Looking forward to hearing from you.
> > > > > >
> > > > > > [1]
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-8%3A+Introduce+listeners+for+Paimon
> > > > > >
> > > > > > Best
> > > > > > shidayang
> > > > > >
> > > >
> >

Reply via email to