GitHub user ddvlanck created a discussion: Performance issues on small dataset: any recommended configuration or tuning options?
Hi! We are currently evaluating Apache Jena Fuseki by loading a dataset and running a set of SPARQL queries against it. However, I’m running into performance issues that seem unexpected for the dataset size. ⚠️ Issue We are testing with a dataset containing **~16 million triples**, and we’ve configured a **60-second timeout** for query responses in our experiment setup. All SELECT queries—ranging from those expected to return around **10 rows** up to those expected to return **~1M rows**—hit the timeout limit. Even the queries with very small expected result sets fail to return within 60 seconds. The system running Apache Jena Fuseki has **128 GB RAM** available, so hardware limitations don’t appear to be the cause. 🧪 Setup - Fuseki version: created a local Docker build using Apache Jena `5.6.0` - Deployment: Docker - Dataset size: 16,127,232 triples - Query type: SELECT - Operating system: Ubuntu 24.04.3 LTS We updated the `JAVA_OPTIONS` to `-Xms90g -Xmx90g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch -XX:+UseStringDeduplication` 💡 What we’re looking for We’d like to understand whether there are configuration options, tuning parameters, or indexing strategies we should apply to improve query performance. Any guidance on: - Recommended settings for larger datasets - Known performance limitations in Fuseki - Configuration flags that influence query planning or execution …would be greatly appreciated. Thanks in advance! GitHub link: https://github.com/apache/jena/discussions/3600 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
