Hi, folks.
Currently we have something like contribution guide parts in several places
(I thought about [1] and [2] and Chris also mentioned [3]) covering
different facets of contributing to Apache Tika.
One thing which make me upset is that we have very inconsistent codebase
with different style, formatting, dependency management. It seems
inevitable on some stage of any popular open source project developed by
many contributors. But we can make it more consistent with moderate effort
for maintaining status quo after.
I propose:
1. make one source of truth about contribution guide and then
automatically mirror it to README.md/CONTRIBUTING.md for github, publish on
tika.a.o etc;
2. add info about logging in tika-core and other packages to these
contribution guide to make all contributions consistent with current policy
(with examples how logging should be used in different modules):
1. JUL in tika-core
2. SLF4J in `private static final Logger LOG` field in all other
modules;
3. Allow to use logging backend (log4j) in tests (e.g. for tuning log
levels for upstream libraries) and standalone application (e.g.
to support
`--quiet` and `--verbose` CLI keys);
4. Document logging configuration in case OSGi bundle is used;
3. add info about dependency handling (e.g. no additional deps in
tika-core policy, exlusion of commons-logging/commons-logging-api/log4j
from dependencies etc);
4. integrate checkstyle plugin [5], [6] to Maven build to allow
contributors easily check that their code is conformant with simple policy
to start (4 spaces indent, no TABs, spaces before opening braces, spaces
after if/else/try/catch/finally, egyptian-style braces);
5. add documentation about checkstyle [5] configuration in IDE to
simplify it's usage (I can write one for JetBrains IDEA at least).
Main point are to bring Tika codebase to more consistent and clear state,
simplify its maintainance and make it easier for contributors to make clean
and pretty patches. Checkstyle configuration should be as simple as it can
be to real to refactor.
Also, these items should be integrated gradually, step by step.
What do you think, folks?
Would it be good thing for Tika and its community?
Would it bring any serios challenges of which I've forgot?
[1]: http://tika.apache.org/contribute.html
[2]: https://wiki.apache.org/tika/DeveloperResources
[3]: https://github.com/apache/tika/#contributing-via-github
[4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
[5]: http://checkstyle.sourceforge.net/
[6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
--
Best regards,
Konstantin Gribov