lewismc opened a new pull request, #2367: URL: https://github.com/apache/tika/pull/2367
This covers task # 1 (Research and Setup) from [TIKA-4513](https://issues.apache.org/jira/browse/TIKA-4513) e.g. > 1. Research and Setup > > Review OpenTelemetry Java getting-started guide and instrumentation registry for Tika-relevant libraries (e.g., auto-instrumentation for Jetty HTTP server, Apache HttpClient). > Set up a local dev environment with Tika Server, OpenTelemetry Java agent (latest stable release), and a test collector (e.g., [Grafana Alloy](https://grafana.com/docs/alloy/latest/) in Docker). > Prototype basic trace export for a sample /tika request. I have lots of commentary to add... which I will do in due course. For now I was thinking of creating a video demo to better communicate the PR and what it offers. One important thing, instrumentation (per OTEL) is disabled by default therefore the impact to existing Tika users is very small. Before I get around to asking people to review this PR, I want to agree on how structure the constituent tasks in TIKA-4513. I will continue that conversation on the Jira ticket. In the meantime if anyone wishes to take this for a spin the markdown documentation (most notably `OPENTELEMETRY.md`) will get you up and running. **NOTE**: I used `Claude-4.5-sonnet` to generate - the markdown documents, I will note that Claude generates lots of mistakes which I fixed by hand during my peer review. That being said, I've literally stepped through this documentation line-by-line now and I genuinely don't think I could have done it better myself if you gave me another week. I'm impressed and satisfied with the in-progress result. - some Javadoc, notably the Javadocs with loads of commentary. Again, I'm satisfied with the outcome and I think it will assist in a better understanding of the additions. - `TikaOpenTelemetryTest.java`... some basic unit test coverage which was convenient. - to figure out that `TikaOpenTelemetryConfig` had to `implements Initializable`... this saved me loads of study time as it had been ages since I looked at tika-server internals and lots has changed. This instrumentation mega-project is likely similar in scale to tika-pipes. There is still loads of work to do. You will also have noticed that I used [Jaeger](https://www.jaegertracing.io/) a basic example. I will be providing another example using [Grafana Alloy as the OTEL collector](https://github.com/grafana/alloy) as it is much more closely aligned with $dayjob but that being said I did want to demonstrate the power of OTEL as a vendor agnostic instrumentation framework. Very powerful indeed. In the meantime heres a few screenshots which demonstrate what a trace containing two spans looks like in Jaeger. Pretty basic but exciting stuff. <img width="1710" height="1112" alt="Screenshot 2025-10-16 at 22 27 23" src="https://github.com/user-attachments/assets/d6a81991-6ccd-4d54-b743-a8cfc29a7286" /> <img width="1710" height="1112" alt="Screenshot 2025-10-16 at 22 27 47" src="https://github.com/user-attachments/assets/a0e06925-9086-4d87-8dc8-1ab60a187aeb" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
