I think maybe if I understood this correctly this design is going in the
wrong direction.  The problem with Flink logging, when you are running
multiple jobs in the same TMs, is not just about separating out the
business level logging into separate files.  The Flink framework itself
logs many things where there is clearly a single job in context but that
all ends up in the same log file and with no clear separation amongst the
log lines.

Also, I don't think shooting to have multiple log files is a very good idea
either.  It's common, especially on container-based deployments, that the
expectation is that a process (like Flink) logs everything to stdout and
the surrounding tooling takes care of routing that log data somewhere.  I
think we should stick with that model and expect that there will be a
single log stream coming out of each Flink process.

Instead, I think it would be better to enhance Flink's logging capability
such that the appropriate context can be added to each log line with the
exact format controlled by the end user.  It might make sense to take a
look at MDC, for example, as a way to approach this.


On Thu, Feb 28, 2019 at 4:24 AM vino yang <yanghua1...@gmail.com> wrote:

> Dear devs,
>
> Currently, for log output, Flink does not explicitly distinguish between
> framework logs and user logs. In Task Manager, logs from the framework are
> intermixed with the user's business logs. In some deployment models, such
> as Standalone or YARN session, there are different task instances of
> different jobs deployed in the same Task Manager. It makes the log event
> flow more confusing unless the users explicitly use tags to distinguish
> them and it makes locating problems more difficult and inefficient. For
> YARN job cluster deployment model, this problem will not be very serious,
> but we still need to artificially distinguish between the framework and the
> business log. Overall, we found that Flink's existing log model has the
> following problems:
>
>
>    -
>
>    Framework log and business log are mixed in the same log file. There
>    is no way to make a clear distinction, which is not conducive to problem
>    location and analysis;
>    -
>
>    Not conducive to the independent collection of business logs;
>
>
> Therefore, we propose a mechanism to separate the framework and business
> log. It can split existing log files for Task Manager.
>
> Currently, it is associated with two JIRA issue:
>
>    -
>
>    FLINK-11202[1]: Split log file per job
>    -
>
>    FLINK-11782[2]: Enhance TaskManager log visualization by listing all
>    log files for Flink web UI
>
>
> We have implemented and validated it in standalone and Flink on YARN (job
> cluster) mode.
>
> sketch 1:
>
> [image: flink-web-ui-taskmanager-log-files.png]
>
> sketch 2:
> [image: flink-web-ui-taskmanager-log-files-2.png]
>
> Design documentation :
> https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing
>
> Best,
> Vino
>
> [1]: https://issues.apache.org/jira/browse/FLINK-11202
> [2]: https://issues.apache.org/jira/browse/FLINK-11782
>

Reply via email to