Yes I just had a look at one of my own datasets with 180mt and a footprint
of 28G. The overhead is not too bad at 10-20%. vs raw nt files

I was surprised that the CLEAR ALL directive doesn't remove/release disk
memory. Does TDB2 require a commit to release disk space?

impressed to see that load times went up to 250k/s with 4.2. more than
twice the speed I have seen with 3.15. Not sure if this is OS (Ubuntu
20.04.3 LTS) related.

Maybe we should make a recommendation to the wikidata team to provide us
with a production environment type machine to run some load and query tests.






On Mon, Nov 22, 2021 at 8:43 PM Andy Seaborne <a...@apache.org> wrote:

>
>
> On 21/11/2021 21:03, Marco Neumann wrote:
> > What's the disk footprint these days for 1b on tdb2?
>
> Quite a lot. For 1B BSBM, ~125G (which is a bit heavy on significant
> sized literals - the node themselves are 50G). Obvious for current WD
> scale usage a sprinkling of compression would be good!
>
> One thing xloader gives us is that it makes it possible to load on a
> spinning disk. (it also has lower peak intermediate file space and
> faster because it does not fall into a slow loading mode for the node
> table that tdbloader2 did sometimes.)
>
>      Andy
>
> >
> > On Sun, Nov 21, 2021 at 8:00 PM Andy Seaborne <a...@apache.org> wrote:
> >
> >>
> >>
> >> On 20/11/2021 14:21, Andy Seaborne wrote:
> >>> Wikidata are looking for a replace for BlazeGraph
> >>>
> >>> About WDQS, current scale and current challenges
> >>>     https://youtu.be/wn2BrQomvFU?t=9148
> >>>
> >>> And in the process of appointing a graph consultant: (5 month
> contract):
> >>> https://boards.greenhouse.io/wikimedia/jobs/3546920
> >>>
> >>> and Apache Jena came up:
> >>> https://phabricator.wikimedia.org/T206560#7517212
> >>>
> >>> Realistically?
> >>>
> >>> Full wikidata is 16B triples. Very hard to load - xloader may help
> >>> though the goal for that was to make loading the truthy subset (5B)
> >>> easier. 5B -> 16B is not a trivial step.
> >>
> >> And it's growing at about 1B per quarter.
> >>
> >>
> https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy
> >>
> >>>
> >>> Even if wikidata loads, it would be impractically slow as TDB is today.
> >>> (yes, that's fixable; not practical in their timescales.)
> >>>
> >>> The current discussions feel more like they are looking for a "product"
> >>> - a triplestore that they are use - rather than a collaboration.
> >>>
> >>>       Andy
> >>
> >
> >
>


-- 


---
Marco Neumann
KONA

Reply via email to