GitHub user ddvlanck created a discussion: Performance issues on small dataset: 
any recommended configuration or tuning options?

Hi! We are currently evaluating Apache Jena Fuseki by loading a dataset and 
running a set of SPARQL queries against it.
However, I’m running into performance issues that seem unexpected for the 
dataset size.

⚠️ Issue

We are testing with a dataset containing **~16 million triples**, and we’ve 
configured a **60-second timeout** for query responses in our experiment setup.
All SELECT queries—ranging from those expected to return around **10 rows** up 
to those expected to return **~1M rows**—hit the timeout limit. Even the 
queries with very small expected result sets fail to return within 60 seconds.
The system running Apache Jena Fuseki has **128 GB RAM** available, so hardware 
limitations don’t appear to be the cause.

🧪 Setup

- Fuseki version: created a local Docker build using Apache Jena `5.6.0`
- Deployment: Docker
- Dataset size: 16,127,232 triples
- Query type: SELECT
- Operating system: Ubuntu 24.04.3 LTS

We updated the `JAVA_OPTIONS` to `-Xms90g -Xmx90g -XX:+UseG1GC 
-XX:MaxGCPauseMillis=200 -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch 
-XX:+UseStringDeduplication`

💡 What we’re looking for

We’d like to understand whether there are configuration options, tuning 
parameters, or indexing strategies we should apply to improve query performance.
Any guidance on:
- Recommended settings for larger datasets
- Known performance limitations in Fuseki
- Configuration flags that influence query planning or execution

…would be greatly appreciated.

Thanks in advance!

GitHub link: https://github.com/apache/jena/discussions/3600

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to