[ 
https://issues.apache.org/jira/browse/JENA-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135159#comment-17135159
 ] 

Andy Seaborne commented on JENA-1909:
-------------------------------------

That is really good to hear! 12B triples!

And hearing it is usable 

Jonas - could I ask one thing - could you email [email protected] to 
announce this success?

----

The temporary files are candidates for being on disk, not SSD.  That would 
reduce the peak SSD needed.

In tdbloader2, the the first phase is better done on SSD. That's hard to avoid 
with the current TDB1 design (or TDB2 if a "tdb2.tdbloader2" were written).

The temporary files are more suitable for rotational disk which building. 

While the secondary indexes (after the first phase) are written in a 
disk-friendly fashion, when used, all the indexes benefit from SSD.

So there are important learning point here as well.

 

> TDB1: tdbloader2 crashes
> ------------------------
>
>                 Key: JENA-1909
>                 URL: https://issues.apache.org/jira/browse/JENA-1909
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: Jena 3.15.0
>            Reporter: Jonas Sourlier
>            Priority: Major
>         Attachments: signature.asc, signature.asc, tdb2.log
>
>
> This might be related to JENA-1908, but since the stack trace is different, I 
> opened a second ticket.
> Tried to import the latest Wikidata dump into Apache Jena, using the 
> following setup:
>  * Ubuntu 20.04 on Windows 10 Subsystem for Linux
>  * Apache Jena 3.15.0
>  * Intel i7 4770K, 32GB RAM
>  * 
> {code:java}
> openjdk 11.0.7 2020-04-14
> OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
> OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, 
> sharing){code}
> These are the commands I have run:
> {code:java}
> wget -c 
> http://mirror.easyname.ch/apache/jena/binaries/apache-jena-3.15.0.tar.gz
> tar -xvzf apache-jena-3.15.0.tar.gz
> mkdir data
> apache-jena-3.15.0/bin/tdbloader2 --phase data --loc data/ ../latest-all.ttl 
> > tdb1.log 2> tdb2.log &
> apache-jena-3.15.0/bin/tdbloader2 --phase index --loc data/  > tdb1.log 2> 
> tdb2.log &
> {code}
> The data phase ran fine, but the index phase crashed after about 10 hours. 
> The stack trace is attached to this ticket (tdb2.log).
> Here's the standard output:
> {code:java}
>  08:47:57 INFO -- TDB Bulk Loader Start
>  08:47:57 INFO Index Building Phase
>  08:47:57 INFO Creating Index SPO
>  08:47:58 INFO Sort SPO
>  18:26:19 INFO Sort SPO Completed
>  18:26:19 INFO Build SPO
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to