[
https://issues.apache.org/jira/browse/SAMZA-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063637#comment-14063637
]
Yan Fang commented on SAMZA-300:
--------------------------------
Thanks for the explanation.
{quote}
1) expose intra-job (job-level) metrics in the YARN AM 2) expose inter-job
(topology-level) metrics in a standalone dashboard.
{quote}
These may need to be discussed and done in other tickets too.
{quote}
we should open some tickets for the relevant ones. All three you list are
useful. Ganglia also, probably.
{quote}
Created separate tickets for each format.
> Track producers and consumers of streams
> ----------------------------------------
>
> Key: SAMZA-300
> URL: https://issues.apache.org/jira/browse/SAMZA-300
> Project: Samza
> Issue Type: New Feature
> Reporter: Martin Kleppmann
>
> Each Samza job runs independently, which has a lot of advantages. However,
> there are situations in which it would be valuable to have a global overview
> of the data flows between jobs. For example:
> - It's important for correctness that only one job ever publishes to a given
> checkpoint or changelog stream — if several jobs publish to the same stream,
> the result is nonsensical. However, we currently have no way of enforcing
> that. It would be good if a job could take a "write lock" on a stream, and
> thus prevent others from writing to it.
> - It would be awesome to have a dashboard/visualization that graphically
> shows the job graph, and visually highlights the health of a job (e.g.
> whether a job is fallen behind).
> - The job graph would also be generally useful for tracking data provenance
> (finding consumers who would be affected by a schema change, finding the team
> that is responsible for producing a particular stream, etc)
> - Potentially could include additional metadata about streams, e.g. owner,
> serialization format, schema, documentation of semantics of the data, etc.
> (HCatalog for streams?)
> One possibility would be for Kafka to add some of this functionality,
> although it may also make sense to implement it in Samza (that way it would
> be available for non-Kafka systems as well, and could use knowledge about the
> job that Samza has, but Kafka hasn't).
> This is just a vague description to start a discussion. Please comment with
> your ideas on how to best implement this.
--
This message was sent by Atlassian JIRA
(v6.2#6252)