On 21/11/2021 21:03, Marco Neumann wrote:
What's the disk footprint these days for 1b on tdb2?
Quite a lot. For 1B BSBM, ~125G (which is a bit heavy on significant
sized literals - the node themselves are 50G). Obvious for current WD
scale usage a sprinkling of compression would be good!
One thing xloader gives us is that it makes it possible to load on a
spinning disk. (it also has lower peak intermediate file space and
faster because it does not fall into a slow loading mode for the node
table that tdbloader2 did sometimes.)
Andy
On Sun, Nov 21, 2021 at 8:00 PM Andy Seaborne <a...@apache.org> wrote:
On 20/11/2021 14:21, Andy Seaborne wrote:
Wikidata are looking for a replace for BlazeGraph
About WDQS, current scale and current challenges
https://youtu.be/wn2BrQomvFU?t=9148
And in the process of appointing a graph consultant: (5 month contract):
https://boards.greenhouse.io/wikimedia/jobs/3546920
and Apache Jena came up:
https://phabricator.wikimedia.org/T206560#7517212
Realistically?
Full wikidata is 16B triples. Very hard to load - xloader may help
though the goal for that was to make loading the truthy subset (5B)
easier. 5B -> 16B is not a trivial step.
And it's growing at about 1B per quarter.
https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy
Even if wikidata loads, it would be impractically slow as TDB is today.
(yes, that's fixable; not practical in their timescales.)
The current discussions feel more like they are looking for a "product"
- a triplestore that they are use - rather than a collaboration.
Andy