ccoffline opened a new issue #6529:
URL: https://github.com/apache/incubator-doris/issues/6529
**Is your feature request related to a problem? Please describe.**
Observability is a very important capability for a distributed system. We
have encountered several problems many times while maintaining Doris, such as:
* Difficult to troubleshoot through log. Take a load transaction for
example, we need to access into multiple machines and searching by multiple
signatures (Label/TransactionID/QueryID), which cost a lot of time. Some logs
don't even have any signatures, so we just have to guess which transaction it
belongs to.
* Logs are not structured, which makes analyzing logs difficult, many is
impossible.
* Our company has a centralized log collection system, which have custom log
format. Even if some logs have a weak formatted output, it cannot match our
custom log format. We have to custom logging for the most important processes
which are query and stream load.
**Describe the solution you'd like**
We are considering a generic logging framework extension to support
structured logging in Doris, and support different Doris maintainers to
configure their own structured logging output format.
```java
// unstructured logging, output 'here is an info for a query, queryId=xxx'
LOG.info("here is an info for a query, queryId={}", queryId);
// structured logging, output custom log format, like 'here is an info for a
query {"queryId":"xxx"}'
LOG.tag("query_id", queryId).info("here is an info for a query");
```
This allows maintainers to collect logs and transfer anyhow they want. In
our case, we will collect logs into our log center and transfer them into
relational records, so we can process or analyze logs in a table, maybe many
tables.
This can happen without custom logging statements. Contributors can focus on
adding useful information to logs and cleaning up useless ones.
Doris may need to set up specifications for tag names, like CamelCase or
underline_style, or provide common tag methods and let maintainers customize
their own tag names. This is open for discussion
**Describe alternatives you've considered**
We have considered import some observability framework such as
`OpenTelemetry`. The current situation is that OpenTelemetry is still exploring
many capabilities. For example, It doesn't support thrift in the official
distribution; The cpp implementation is in pre-alpha; The logging integration
is immature...
We can extend the logging capabilities to support flexible monitoring and
analysis for Doris clusters' maintainers. At the same time, we can introduce
`OpenTelemetry` to collect trace and metric data for telemetry, which does not
conflict with log extension. Perhaps when `OpenTelemetry` is capable enough for
logging, we can clean up useless logs then.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]