Paolo Castagna wrote:
>> With tdbloader2 I had a java.lang.OutOfMemoryError:

[...]

>> I'll try giving the JVM more RAM.
>
> I tried with -Xmx2048m, but I had the same problem.
> I'll try with -Xmx4096m.

This time, UNIX sort filled /tmp...
I'll try specifying the --temporary-directory=DIR or, better, via $TMPDIR
env variable (this way there is no need to change tdbloader2 script).

>> tdbloader3 run out of disk space (because it is writing temporary files
>> in /tmp and the available instance disk space is mounted on /mnt :-()
>> I'll see how to change/fix this and re-run.
>
> This run almost to completion this time, but I was using --spill-size-auto 
> policy which clearly need improvements.
>

[...]

>
> I'll try with a fixed --spill-size 10000000.

This time, I was able to load the Freebase data dump (converted into
RDF) using tdbloader3.

This is how I run tdbloader3 using an EC2 m1.xlarge instance (i.e. 15
GB memory):
java -Djava.io.tmpdir=/mnt/data/tmp -cp
target/jena-tdbloader3-0.1-incubating-SNAPSHOT-jar-with-dependencies.jar
-server -d64 -Xmx12288M cmd.tdbloader3 --no-stats --compression
--spill-size 10000000 --loc /mnt/data/freebase
/mnt/data/freebase2rdf/freebase-datadump-rdf.nt.gz

Total elapsed time to load 618,465,279 triples:
Total: 618,465,279 tuples : 53,608.12 seconds : 11,536.78 tuples/sec

This is the log:
Mar  6 11:43:59 ip-10-53-130-32 build: INFO  Load:
/mnt/data/freebase2rdf/freebase-datadump-rdf.nt.gz -- 2012/03/06
11:43:59 UTC
Mar  6 11:44:00 ip-10-53-130-32 build: INFO  Add: 50,000 tuples
(Batch: 35,335 / Avg: 35,335)
Mar  6 11:44:01 ip-10-53-130-32 build: INFO  Add: 100,000 tuples
(Batch: 68,212 / Avg: 46,554)
[...]
Mar  6 15:32:38 ip-10-53-130-32 build: INFO  Add: 618,450,000 tuples
(Batch: 89,766 / Avg: 45,079)
Mar  6 15:32:38 ip-10-53-130-32 build: INFO  Node Table (1/3):
building nodes.dat and sorting hash|id ...
Mar  6 17:24:46 ip-10-53-130-32 build: INFO  Add: 50,000 records for
node table (1/3) phase (Batch: 7 / Avg: 7)
Mar  6 17:24:47 ip-10-53-130-32 build: INFO  Add: 100,000 records for
node table (1/3) phase (Batch: 82,236 / Avg: 14)
[...]
Mar  6 21:23:09 ip-10-53-130-32 build: INFO  Add: 1,855,350,000
records for node table (1/3) phase (Batch: 216,450 / Avg: 88,220)
Mar  6 21:23:09 ip-10-53-130-32 build: INFO  Total: 1,855,395,837
tuples : 21,031.01 seconds : 88,221.91 tuples/sec [2012/03/06 21:23:09
UTC]
Mar  6 21:23:40 ip-10-53-130-32 build: INFO  Node Table (2/3):
generating input data using node ids...
Mar  6 23:00:17 ip-10-53-130-32 build: INFO  Add: 50,000 records for
node table (2/3) phase (Batch: 8 / Avg: 8)
Mar  6 23:00:17 ip-10-53-130-32 build: INFO  Add: 100,000 records for
node table (2/3) phase (Batch: 96,899 / Avg: 17)
[...]
Mar  7 01:04:18 ip-10-53-130-32 build: INFO  Add: 618,450,000 records
for node table (2/3) phase (Batch: 95,969 / Avg: 46,718)
Mar  7 01:04:18 ip-10-53-130-32 build: INFO  Total: 618,463,448 tuples
: 13,237.97 seconds : 46,718.90 tuples/sec [2012/03/07 01:04:18 UTC]
Mar  7 01:04:23 ip-10-53-130-32 build: INFO  Node Table (3/3):
building node table B+Tree index (i.e. node2id.dat and node2id.idn
files)...
Mar  7 01:04:38 ip-10-53-130-32 build: INFO  Add: 50,000 records for
node table (3/3) phase (Batch: 3,511 / Avg: 3,511)
Mar  7 01:04:38 ip-10-53-130-32 build: INFO  Add: 100,000 records for
node table (3/3) phase (Batch: 375,939 / Avg: 6,958)
[...]
Mar  7 01:07:21 ip-10-53-130-32 build: INFO  Add: 149,050,000 records
for node table (3/3) phase (Batch: 980,392 / Avg: 838,537)
Mar  7 01:07:24 ip-10-53-130-32 build: INFO  Total: 149,066,002 tuples
: 180.42 seconds : 826,225.75 tuples/sec [2012/03/07 01:07:24 UTC]
Mar  7 01:07:27 ip-10-53-130-32 build: INFO  Index: creating SPO index...
Mar  7 01:08:14 ip-10-53-130-32 build: INFO  Add: 50,000 records to
SPO (Batch: 1,065 / Avg: 1,065)
Mar  7 01:08:15 ip-10-53-130-32 build: INFO  Add: 100,000 records to
SPO (Batch: 54,764 / Avg: 2,090)
[...]
Mar  7 01:18:47 ip-10-53-130-32 build: INFO  Add: 618,450,000 records
to SPO (Batch: 1,020,408 / Avg: 908,977)
Mar  7 01:18:50 ip-10-53-130-32 build: INFO  Total: 618,463,449 tuples
: 682.99 seconds : 905,528.69 tuples/sec [2012/03/07 01:18:50 UTC]
Mar  7 01:18:50 ip-10-53-130-32 build: INFO  Index: creating GSPO index...
Mar  7 01:18:50 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.12
seconds : 0.00 tuples/sec [2012/03/07 01:18:50 UTC]
Mar  7 01:18:56 ip-10-53-130-32 build: INFO  Index: sorting data for
POS index...
Mar  7 01:18:57 ip-10-53-130-32 build: INFO  Add: 50,000 records to
POS (Batch: 210,084 / Avg: 210,084)
Mar  7 01:18:57 ip-10-53-130-32 build: INFO  Add: 100,000 records to
POS (Batch: 1,724,137 / Avg: 374,531)
[...]
Mar  7 01:47:03 ip-10-53-130-32 build: INFO  Add: 618,450,000 records
to POS (Batch: 4,545,454 / Avg: 366,790)
Mar  7 01:47:03 ip-10-53-130-32 build: INFO  Total: 618,463,449 tuples
: 1,686.18 seconds : 366,783.97 tuples/sec [2012/03/07 01:47:03 UTC]
Mar  7 01:47:03 ip-10-53-130-32 build: INFO  Index: creating POS index...
Mar  7 01:47:41 ip-10-53-130-32 build: INFO  Add: 50,000 records to
POS (Batch: 1,321 / Avg: 1,321)
Mar  7 01:47:41 ip-10-53-130-32 build: INFO  Add: 100,000 records to
POS (Batch: 1,086,956 / Avg: 2,639)
[...]
Mar  7 01:57:37 ip-10-53-130-32 build: INFO  Add: 618,450,000 records
to POS (Batch: 1,162,790 / Avg: 974,417)
Mar  7 01:57:42 ip-10-53-130-32 build: INFO  Total: 618,463,449 tuples
: 638.92 seconds : 967,976.50 tuples/sec [2012/03/07 01:57:42 UTC]
Mar  7 01:57:47 ip-10-53-130-32 build: INFO  Index: sorting data for
OSP index...
Mar  7 01:57:47 ip-10-53-130-32 build: INFO  Add: 50,000 records to
OSP (Batch: 373,134 / Avg: 373,134)
Mar  7 01:57:47 ip-10-53-130-32 build: INFO  Add: 100,000 records to
OSP (Batch: 549,450 / Avg: 444,444)
[...]
Mar  7 02:26:23 ip-10-53-130-32 build: INFO  Add: 618,450,000 records
to OSP (Batch: 4,166,666 / Avg: 360,257)
Mar  7 02:26:23 ip-10-53-130-32 build: INFO  Total: 618,463,449 tuples
: 1,716.69 seconds : 360,264.44 tuples/sec [2012/03/07 02:26:23 UTC]
Mar  7 02:26:23 ip-10-53-130-32 build: INFO  Index: creating OSP index...
Mar  7 02:27:02 ip-10-53-130-32 build: INFO  Add: 50,000 records to
OSP (Batch: 1,284 / Avg: 1,284)
Mar  7 02:27:03 ip-10-53-130-32 build: INFO  Add: 100,000 records to
OSP (Batch: 364,963 / Avg: 2,560)
[...]
Mar  7 02:37:18 ip-10-53-130-32 build: INFO  Add: 618,450,000 records
to OSP (Batch: 1,020,408 / Avg: 944,877)
Mar  7 02:37:22 ip-10-53-130-32 build: INFO  Total: 618,463,449 tuples
: 658.94 seconds : 938,578.94 tuples/sec [2012/03/07 02:37:22 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: sorting data for
GPOS index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.03
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: creating GPOS index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: sorting data for
GOSP index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: creating GOSP index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: sorting data for
POSG index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: creating POSG index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: sorting data for
OSPG index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: creating OSPG index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: sorting data for
SPOG index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Index: creating SPOG index...
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 0 tuples : 0.00
seconds : 0.00 tuples/sec [2012/03/07 02:37:27 UTC]
Mar  7 02:37:27 ip-10-53-130-32 build: INFO  Total: 618,465,279 tuples
: 53,608.12 seconds : 11,536.78 tuples/sec [2012/03/07 02:37:27 UTC]

Paolo

Reply via email to