that's on commodity hardware http://www.lotico.com/index.php/JENA_Loader_Benchmarks
load times are just load times. Including indexing I'm down to 137,217 t/s sure with a billion triples I am down to 87kt/s but still reasonable for most of my use cases. On Tue, Nov 23, 2021 at 10:44 AM Andy Seaborne <a...@apache.org> wrote: > > > On 22/11/2021 21:14, Marco Neumann wrote: > > Yes I just had a look at one of my own datasets with 180mt and a > footprint > > of 28G. The overhead is not too bad at 10-20%. vs raw nt files > > > > I was surprised that the CLEAR ALL directive doesn't remove/release disk > > memory. Does TDB2 require a commit to release disk space? > > Any active read transactions can still see the old data. You can't > delete it for real. > > Run compact. > > > impressed to see that load times went up to 250k/s > > What was the hardware? > > > with 4.2. more than > > twice the speed I have seen with 3.15. Not sure if this is OS (Ubuntu > > 20.04.3 LTS) related. > > You won't get 250k at scale. Loading rate slows for algorithmic reasons > and system reasons. > > Now try 500m! > > > Maybe we should make a recommendation to the wikidata team to provide us > > with a production environment type machine to run some load and query > tests. > > > > > > > > > > > > > > On Mon, Nov 22, 2021 at 8:43 PM Andy Seaborne <a...@apache.org> wrote: > > > >> > >> > >> On 21/11/2021 21:03, Marco Neumann wrote: > >>> What's the disk footprint these days for 1b on tdb2? > >> > >> Quite a lot. For 1B BSBM, ~125G (which is a bit heavy on significant > >> sized literals - the node themselves are 50G). Obvious for current WD > >> scale usage a sprinkling of compression would be good! > >> > >> One thing xloader gives us is that it makes it possible to load on a > >> spinning disk. (it also has lower peak intermediate file space and > >> faster because it does not fall into a slow loading mode for the node > >> table that tdbloader2 did sometimes.) > >> > >> Andy > >> > >>> > >>> On Sun, Nov 21, 2021 at 8:00 PM Andy Seaborne <a...@apache.org> wrote: > >>> > >>>> > >>>> > >>>> On 20/11/2021 14:21, Andy Seaborne wrote: > >>>>> Wikidata are looking for a replace for BlazeGraph > >>>>> > >>>>> About WDQS, current scale and current challenges > >>>>> https://youtu.be/wn2BrQomvFU?t=9148 > >>>>> > >>>>> And in the process of appointing a graph consultant: (5 month > >> contract): > >>>>> https://boards.greenhouse.io/wikimedia/jobs/3546920 > >>>>> > >>>>> and Apache Jena came up: > >>>>> https://phabricator.wikimedia.org/T206560#7517212 > >>>>> > >>>>> Realistically? > >>>>> > >>>>> Full wikidata is 16B triples. Very hard to load - xloader may help > >>>>> though the goal for that was to make loading the truthy subset (5B) > >>>>> easier. 5B -> 16B is not a trivial step. > >>>> > >>>> And it's growing at about 1B per quarter. > >>>> > >>>> > >> > https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy > >>>> > >>>>> > >>>>> Even if wikidata loads, it would be impractically slow as TDB is > today. > >>>>> (yes, that's fixable; not practical in their timescales.) > >>>>> > >>>>> The current discussions feel more like they are looking for a > "product" > >>>>> - a triplestore that they are use - rather than a collaboration. > >>>>> > >>>>> Andy > >>>> > >>> > >>> > >> > > > > > -- --- Marco Neumann KONA