Yes I just had a look at one of my own datasets with 180mt and a footprint of 28G. The overhead is not too bad at 10-20%. vs raw nt files
I was surprised that the CLEAR ALL directive doesn't remove/release disk memory. Does TDB2 require a commit to release disk space? impressed to see that load times went up to 250k/s with 4.2. more than twice the speed I have seen with 3.15. Not sure if this is OS (Ubuntu 20.04.3 LTS) related. Maybe we should make a recommendation to the wikidata team to provide us with a production environment type machine to run some load and query tests. On Mon, Nov 22, 2021 at 8:43 PM Andy Seaborne <a...@apache.org> wrote: > > > On 21/11/2021 21:03, Marco Neumann wrote: > > What's the disk footprint these days for 1b on tdb2? > > Quite a lot. For 1B BSBM, ~125G (which is a bit heavy on significant > sized literals - the node themselves are 50G). Obvious for current WD > scale usage a sprinkling of compression would be good! > > One thing xloader gives us is that it makes it possible to load on a > spinning disk. (it also has lower peak intermediate file space and > faster because it does not fall into a slow loading mode for the node > table that tdbloader2 did sometimes.) > > Andy > > > > > On Sun, Nov 21, 2021 at 8:00 PM Andy Seaborne <a...@apache.org> wrote: > > > >> > >> > >> On 20/11/2021 14:21, Andy Seaborne wrote: > >>> Wikidata are looking for a replace for BlazeGraph > >>> > >>> About WDQS, current scale and current challenges > >>> https://youtu.be/wn2BrQomvFU?t=9148 > >>> > >>> And in the process of appointing a graph consultant: (5 month > contract): > >>> https://boards.greenhouse.io/wikimedia/jobs/3546920 > >>> > >>> and Apache Jena came up: > >>> https://phabricator.wikimedia.org/T206560#7517212 > >>> > >>> Realistically? > >>> > >>> Full wikidata is 16B triples. Very hard to load - xloader may help > >>> though the goal for that was to make loading the truthy subset (5B) > >>> easier. 5B -> 16B is not a trivial step. > >> > >> And it's growing at about 1B per quarter. > >> > >> > https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy > >> > >>> > >>> Even if wikidata loads, it would be impractically slow as TDB is today. > >>> (yes, that's fixable; not practical in their timescales.) > >>> > >>> The current discussions feel more like they are looking for a "product" > >>> - a triplestore that they are use - rather than a collaboration. > >>> > >>> Andy > >> > > > > > -- --- Marco Neumann KONA