Re: [DISCUSS] Flink framework and user log separation

vino yang Thu, 04 Jul 2019 04:45:04 -0700

Hi Stephan,

Thanks for your reply.


In some cases, your solution can take effects.

However, in some scenarios, it does not meet the requirement:


   - One program has multiple job instances;
   - If we make Flink as a platform, we can not know the package of the
   users' program to config the log profiles before starting the cluster

Chesnay's understanding is right, we need to split business logs based on
the job.

Recently, a user also feedbacked this requirement.[1]

[1]: https://issues.apache.org/jira/browse/FLINK-12953

Stephan Ewen <se...@apache.org> 于2019年7月4日周四 下午6:38写道：

> Is that something that can just be done by the right logging framework and
> configuration?
>
> Like having a log framework with two targets, one filtered on
> "org.apache.flink" and the other one filtered on "my.company.project" or
> so?
>
> On Fri, Mar 1, 2019 at 3:44 AM vino yang <yanghua1...@gmail.com> wrote:
>
> > Hi Jamie Grier,
> >
> > Thank you for your reply, let me add some explanations to this design.
> >
> > First of all, as stated in "Goal", it is mainly for the "Standalone"
> > cluster model, although we have implemented it for Flink on YARN, this
> does
> > not mean that we can't turn off this feature by means of options. It
> should
> > be noted that the separation is basically based on the "log configuration
> > file", it is very scalable and even allows users to define the log
> pattern
> > of the configuration file (of course this is an extension feature, not
> > mentioned in the design documentation). In fact, "multiple files are a
> > special case of a single file", we can provide an option to keep it still
> > the default behavior, it should be the scene you expect in the container.
> >
> > According to Flink's official 2016 adjustment report [1], users using the
> > standalone mode are quite close to the yarn mode (unfortunately there is
> no
> > data support in 2017). Although we mainly use Flink on Yarn now, we have
> > used standalone in depth (close to the daily processing volume of 20
> > trillion messages). In this scenario, the user logs generated by
> different
> > job's tasks are mixed together, and it is very difficult to locate the
> > issue. Moreover, as we configure the log file scrolling policy, we have
> to
> > log in to the server to view it. Therefore, we expect that for the same
> > task manager, the user logs generated by the tasks from the same job can
> be
> > distinguished.
> >
> > In addition, I have tried MDC technology, but it can not achieve the
> goal.
> > The underlying Flink is log4j 1.x and logback. We need to be compatible
> > with both frameworks at the same time, and we don't allow large-scale
> > changes to the active code, and no sense to the user.
> >
> > Some other points:
> >
> > 1) Many of our users have experience using Storm and Spark, and they are
> > more accustomed to that style in standalone mode;
> > 2) We split the user log by Job, which will help to implement the
> "business
> > log aggregation" feature based on the Job.
> >
> > Best,
> > Vino
> >
> > [1]: https://www.ververica.com/blog/flink-user-survey-2016-part-1
> >
> > Jamie Grier <jgr...@lyft.com.invalid> 于2019年3月1日周五 上午7:32写道：
> >
> > > I think maybe if I understood this correctly this design is going in
> the
> > > wrong direction.  The problem with Flink logging, when you are running
> > > multiple jobs in the same TMs, is not just about separating out the
> > > business level logging into separate files.  The Flink framework itself
> > > logs many things where there is clearly a single job in context but
> that
> > > all ends up in the same log file and with no clear separation amongst
> the
> > > log lines.
> > >
> > > Also, I don't think shooting to have multiple log files is a very good
> > idea
> > > either.  It's common, especially on container-based deployments, that
> the
> > > expectation is that a process (like Flink) logs everything to stdout
> and
> > > the surrounding tooling takes care of routing that log data
> somewhere.  I
> > > think we should stick with that model and expect that there will be a
> > > single log stream coming out of each Flink process.
> > >
> > > Instead, I think it would be better to enhance Flink's logging
> capability
> > > such that the appropriate context can be added to each log line with
> the
> > > exact format controlled by the end user.  It might make sense to take a
> > > look at MDC, for example, as a way to approach this.
> > >
> > >
> > > On Thu, Feb 28, 2019 at 4:24 AM vino yang <yanghua1...@gmail.com>
> wrote:
> > >
> > > > Dear devs,
> > > >
> > > > Currently, for log output, Flink does not explicitly distinguish
> > between
> > > > framework logs and user logs. In Task Manager, logs from the
> framework
> > > are
> > > > intermixed with the user's business logs. In some deployment models,
> > such
> > > > as Standalone or YARN session, there are different task instances of
> > > > different jobs deployed in the same Task Manager. It makes the log
> > event
> > > > flow more confusing unless the users explicitly use tags to
> distinguish
> > > > them and it makes locating problems more difficult and inefficient.
> For
> > > > YARN job cluster deployment model, this problem will not be very
> > serious,
> > > > but we still need to artificially distinguish between the framework
> and
> > > the
> > > > business log. Overall, we found that Flink's existing log model has
> the
> > > > following problems:
> > > >
> > > >
> > > >    -
> > > >
> > > >    Framework log and business log are mixed in the same log file.
> There
> > > >    is no way to make a clear distinction, which is not conducive to
> > > problem
> > > >    location and analysis;
> > > >    -
> > > >
> > > >    Not conducive to the independent collection of business logs;
> > > >
> > > >
> > > > Therefore, we propose a mechanism to separate the framework and
> > business
> > > > log. It can split existing log files for Task Manager.
> > > >
> > > > Currently, it is associated with two JIRA issue:
> > > >
> > > >    -
> > > >
> > > >    FLINK-11202[1]: Split log file per job
> > > >    -
> > > >
> > > >    FLINK-11782[2]: Enhance TaskManager log visualization by listing
> all
> > > >    log files for Flink web UI
> > > >
> > > >
> > > > We have implemented and validated it in standalone and Flink on YARN
> > (job
> > > > cluster) mode.
> > > >
> > > > sketch 1:
> > > >
> > > > [image: flink-web-ui-taskmanager-log-files.png]
> > > >
> > > > sketch 2:
> > > > [image: flink-web-ui-taskmanager-log-files-2.png]
> > > >
> > > > Design documentation :
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing
> > > >
> > > > Best,
> > > > Vino
> > > >
> > > > [1]: https://issues.apache.org/jira/browse/FLINK-11202
> > > > [2]: https://issues.apache.org/jira/browse/FLINK-11782
> > > >
> > >
> >
>

Re: [DISCUSS] Flink framework and user log separation

Reply via email to