[jira] [Closed] (JENA-2314) tdb2.tdbloader performance issue

Andy Seaborne (Jira) Thu, 24 Mar 2022 02:09:04 -0700


     [ 
https://issues.apache.org/jira/browse/JENA-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andy Seaborne closed JENA-2314.
-------------------------------

> tdb2.tdbloader performance issue
> --------------------------------
>
>                 Key: JENA-2314
>                 URL: https://issues.apache.org/jira/browse/JENA-2314
>             Project: Apache Jena
>          Issue Type: Question
>          Components: TDB2
>    Affects Versions: Jena 4.0.0, Jena 4.2.0, Jena 4.4.0
>         Environment: Java maximum memory: 12884901888
> symbol:http://jena.apache.org/ARQ#regexImpl = 
> symbol:http://jena.apache.org/ARQ#javaRegex
> symbol:http://jena.apache.org/ARQ#registryFunctions = 
> org.apache.jena.sparql.function.FunctionRegistry@1536602f
> symbol:http://jena.apache.org/ARQ#constantBNodeLabels = true
> symbol:http://jena.apache.org/ARQ#registryPropertyFunctions = 
> org.apache.jena.sparql.pfunction.PropertyFunctionRegistry@4ebea12c
> symbol:http://jena.apache.org/ARQ#stageGenerator = 
> org.apache.jena.tdb2.solver.StageGeneratorDirectTDB@2a1edad4
> symbol:http://jena.apache.org/ARQ#enablePropertyFunctions = true
> symbol:http://jena.apache.org/ARQ#strictSPARQL = false
> 13:02:36 INFO  loader          :: Loader = LoaderParallel
> 13:02:36 INFO  loader          :: Start: 6 files
> 13:02:48 INFO  loader          :: Add: 500,000 bdmhistoricalrecords.nq 
> (Batch: 40,361 / Avg: 40,361)
> 13:03:00 INFO  loader          :: Add: 1,000,000 bdmhistoricalrecords.nq 
> (Batch: 44,907 / Avg: 42,513)
> 13:03:10 INFO  loader          :: Add: 1,500,000 bdmhistoricalrecords.nq 
> (Batch: 47,980 / Avg: 44,191)
> 13:03:25 INFO  loader          :: Add: 2,000,000 bdmhistoricalrecords.nq 
> (Batch: 32,486 / Avg: 40,539)
> 13:33:06 INFO  loader          :: Add: 2,500,000 bdmhistoricalrecords.nq 
> (Batch: 280 / Avg: 1,366)
> 14:30:30 INFO  loader          :: Add: 3,000,000 bdmhistoricalrecords.nq 
> (Batch: 145 / Avg: 568)
> 14:52:29 INFO  loader          :: Add: 3,500,000 bdmhistoricalrecords.nq 
> (Batch: 378 / Avg: 530)
>            Reporter: R Pope
>            Priority: Major
>
> Kia ora, Hi there,
> We have been using tdb2.tdbloader to load ~400,000,000 triples into our 
> triplestore - all the data is in nq format being previoiusly converted from 
> JSONLD. The files we are loading range from ~10GB to ~50GB producing a 
> triplestore ~180GB including a text index. We run the loader in an HPC 
> environment so we can request as much memory as we need, often using 1TB to 
> do the load. The job is run in a Singularity image (similar to docker) and 
> slurm is the chosen workload manager.
> All that aside, the load typically takes ~12-16hours but no more than 24 
> hours with --loader=parallel and an average rate of ~5,000 triples per 
> second. We haven't needed to run the loader since October 2021, so upon 
> recently running the load job again we are getting a grand average of about 
> ~500triples per second. Haven't been able to wait and see if it even finishes.
> Has anyone else experienced such a big performance loss with tdb2.tdbloader 
> in the current or recent versions of jena? Apart from the potential 
> investigation that can be done on the slurm/HPC side does anyone have advice 
> around performance?
> Thanks in advance



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Closed] (JENA-2314) tdb2.tdbloader performance issue

Reply via email to