Hi Jocean

Thanks for your answer. I think there are two types of the information you
want to report: the ddl events and the runtime events such as commit,
compaction.

For the ddl events, I don't quite understand why you need to poll the table
information regularly? As we all know that Paimon is really a storage which
has all meta information in it, and even when you poll the information from
Paimon, you need to store it somewhere. I think you can just use Paimon as
the storage itself. If the performance of obtaining Paimon tables is
relatively low, such as the large number of tables you mentioned, I think
we should improve this, for example, add a table cache?

For the runtime events, I understand that they are indeed necessary to
report to a system like `Table Service`. But my issue is: can we do this in
the existing metrics mechanism? For example, reporting relevant metrics to
the `Table Service` instead of adding a new `listener`? If the metrics
information is not complete enough, we can continue to add information in
it.

Best,
Shammon FY

On Tue, Aug 22, 2023 at 2:20 PM Jocean shi <[email protected]> wrote:

> Hi Shammon FY,
>
> I get your point, but the role of a Listener is more towards
> notification. For example, as you mentioned, we can query the relevant
> information through APIs for DDL and commit information. However, when
> we want to know if there have been any changes to the relevant
> information, we need to constantly poll the tables. This mechanism can
> be resource-intensive, especially when there are many tables. With a
> Listener, we can promptly detect changes in status. Consider a
> separate Table service that has a requirement to compact all tables,
> and the compact parameters are stored in the options. When there is a
> change in the options of a table, the Table Service needs to be
> notified promptly to determine whether to immediately compact the
> table. When there is new data committed to a table, it needs to be
> promptly detected to determine whether to compact it. Also, users need
> the assistance of CommitEvent to trigger downstream tasks based on the
> watermark of a table.
> Querying compact information through SQL or APIs is indeed a good way.
> It is relatively simple to query historical compact records. However,
> if you want to know the current compact status of a table, using a
> Listener may be simpler.
>
> Best
> Shidayang
>
> Shammon FY <[email protected]> 于2023年8月21日周一 23:24写道:
> >
> > Hi Jocean,
> >
> > Thanks for your explanation. I still have some issues
> >
> > 1. What are the ddl events for Paimon used for? If you need to show
> tables
> > for paimon in your system, I think it's better to define table related
> > interfaces, and then you can implement them for Paimon, Iceberg and Hudi
> > instead of adding a ddl listener in them. It's more general and you can
> > even manage other tables such as databases, mongodb and hive.
> >
> > 2. If some system information in `CompactEvent` is currently missing or
> > there's no information about `compact`,  I think a better way is to add
> > this system information in Paimon, rather than adding a listener and
> > creating an event with the information. Then the external system can get
> > the information by SQL or API directly, this is a more reasonable
> approach.
> >
> > 3. Also what is the `CommitEvent` used for? Currently we have metrics for
> > `Commit` and jobs can report them. How about adding a customized reporter
> > for metrics instead of a listener for `CommitEvent`?
> >
> > Best,
> > Shammon FY
> >
> >
> >
> >
> > On Mon, Aug 21, 2023 at 5:16 PM Jocean shi <[email protected]> wrote:
> >
> > > Hi Shammon FY,
> > >
> > > Thanks for your comments. I’d like to share my thoughts about your
> > > comments.
> > >
> > > 1. Public Interface
> > > Thank you for the reminder. I overlooked the correspondence between
> > > the Public Interface of PIP and the "@Public" annotation.
> > > My idea was that Event, Listener, and ListenerFactory are public,
> > > while the others are non-public.
> > >
> > > 2.  Add `Factory` to create `Listener`
> > > Great suggestion, I have already added the ListenerFactory to PIP.
> > >
> > > 3. Flink and Spark support meta-data listeners
> > > It will be very inconvenient for users to obtain DDL information
> > > through engines. Firstly, there are many implementations of various
> > > engines that need to be connected. Secondly, in addition to Flink and
> > > Spark, many engines do not support meta-data listeners. As a general
> > > data lake, Paimon should have its own mechanism for meta-data
> > > listeners.
> > >
> > > 4. report events such as commit/compact to an external system
> > > CompactEvent: Currently, the compact state is a black box, and users
> > > cannot obtain the information through SQL or API.
> > > CommitEvent: Currently, the methods of querying through SQL or API are
> > > based on polling, which makes it difficult for users to perceive
> > > commit operations in a timely manner and consumes a lot of resources.
> > >
> > > Best
> > > Shidayang
> > >
> > > Shammon FY <[email protected]> 于2023年8月18日周五 14:07写道:
> > > >
> > > > Thanks @Jocean for starting this discussion, I have some comments
> > > >
> > > > 1. About the public interfaces in the PIP, we should add @Public for
> them
> > > > such as `Event`, `Listener` and even `CommitEvent` and other events.
> But
> > > > for `Listeners`, I don't think it should be a public interface. All
> > > fields
> > > > in the public interface for users should be `Public` too, but I
> found the
> > > > information such as `ManifestEntry` in `CommitEvent` is not a public
> > > > interface. I think you may need to reconsider which interfaces need
> to be
> > > > marked with @Public and which are not.
> > > >
> > > > 2. In general, it is better to give a `Factory` to create `Listener`
> > > which
> > > > should be all marked as `@Public` and you can see
> > > > `CatalogFactory`->`Catalog` as an example.
> > > >
> > > > 3. Currently Flink and Spark support meta-data listeners and we can
> > > support
> > > > reporting ddl information there, should we need to add the same
> listener
> > > in
> > > > Paimon?
> > > >
> > > > 4. Should we need to report the events such as commit/compact to an
> > > > external system? Currently we have some system tables and users can
> get
> > > > these information by SQL or API, should the external system query
> these
> > > > information regularly instead of a listener to push them?
> > > >
> > > > Best,
> > > > Shammon FY
> > > >
> > > >
> > > > On Tue, Aug 15, 2023 at 11:08 AM Jocean shi <[email protected]>
> > > wrote:
> > > >
> > > > > Hi devs:
> > > > >
> > > > > We would like to start a discussion about PIP-8: Introduce
> listeners
> > > > > for Paimon[1].
> > > > >
> > > > > In production environments, users often need to perceive the state
> > > > > changes of Paimon table,
> > > > > such as whether a new file has been committed to the table, in
> which
> > > > > partitions the committed files are,
> > > > > the size and number of the committed files, the status and type of
> > > > > compaction, operations like table creation, deletion, and schema
> > > > > changes, etc.
> > > > > So, we introduce a Listener system for Paimon.
> > > > > Looking forward to hearing from you.
> > > > >
> > > > > [1]
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-8%3A+Introduce+listeners+for+Paimon
> > > > >
> > > > > Best
> > > > > shidayang
> > > > >
> > >
>

Reply via email to