that's on commodity hardware

http://www.lotico.com/index.php/JENA_Loader_Benchmarks

load times are just load times. Including indexing I'm down to 137,217 t/s

sure with a billion triples I am down to 87kt/s

but still reasonable for most of my use cases.


On Tue, Nov 23, 2021 at 10:44 AM Andy Seaborne <a...@apache.org> wrote:

>
>
> On 22/11/2021 21:14, Marco Neumann wrote:
> > Yes I just had a look at one of my own datasets with 180mt and a
> footprint
> > of 28G. The overhead is not too bad at 10-20%. vs raw nt files
> >
> > I was surprised that the CLEAR ALL directive doesn't remove/release disk
> > memory. Does TDB2 require a commit to release disk space?
>
> Any active read transactions can still see the old data. You can't
> delete it for real.
>
> Run compact.
>
> > impressed to see that load times went up to 250k/s
>
> What was the hardware?
>
> > with 4.2. more than
> > twice the speed I have seen with 3.15. Not sure if this is OS (Ubuntu
> > 20.04.3 LTS) related.
>
> You won't get 250k at scale. Loading rate slows for algorithmic reasons
> and system reasons.
>
> Now try 500m!
>
> > Maybe we should make a recommendation to the wikidata team to provide us
> > with a production environment type machine to run some load and query
> tests.
> >
> >
> >
> >
> >
> >
> > On Mon, Nov 22, 2021 at 8:43 PM Andy Seaborne <a...@apache.org> wrote:
> >
> >>
> >>
> >> On 21/11/2021 21:03, Marco Neumann wrote:
> >>> What's the disk footprint these days for 1b on tdb2?
> >>
> >> Quite a lot. For 1B BSBM, ~125G (which is a bit heavy on significant
> >> sized literals - the node themselves are 50G). Obvious for current WD
> >> scale usage a sprinkling of compression would be good!
> >>
> >> One thing xloader gives us is that it makes it possible to load on a
> >> spinning disk. (it also has lower peak intermediate file space and
> >> faster because it does not fall into a slow loading mode for the node
> >> table that tdbloader2 did sometimes.)
> >>
> >>       Andy
> >>
> >>>
> >>> On Sun, Nov 21, 2021 at 8:00 PM Andy Seaborne <a...@apache.org> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 20/11/2021 14:21, Andy Seaborne wrote:
> >>>>> Wikidata are looking for a replace for BlazeGraph
> >>>>>
> >>>>> About WDQS, current scale and current challenges
> >>>>>      https://youtu.be/wn2BrQomvFU?t=9148
> >>>>>
> >>>>> And in the process of appointing a graph consultant: (5 month
> >> contract):
> >>>>> https://boards.greenhouse.io/wikimedia/jobs/3546920
> >>>>>
> >>>>> and Apache Jena came up:
> >>>>> https://phabricator.wikimedia.org/T206560#7517212
> >>>>>
> >>>>> Realistically?
> >>>>>
> >>>>> Full wikidata is 16B triples. Very hard to load - xloader may help
> >>>>> though the goal for that was to make loading the truthy subset (5B)
> >>>>> easier. 5B -> 16B is not a trivial step.
> >>>>
> >>>> And it's growing at about 1B per quarter.
> >>>>
> >>>>
> >>
> https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy
> >>>>
> >>>>>
> >>>>> Even if wikidata loads, it would be impractically slow as TDB is
> today.
> >>>>> (yes, that's fixable; not practical in their timescales.)
> >>>>>
> >>>>> The current discussions feel more like they are looking for a
> "product"
> >>>>> - a triplestore that they are use - rather than a collaboration.
> >>>>>
> >>>>>        Andy
> >>>>
> >>>
> >>>
> >>
> >
> >
>


-- 


---
Marco Neumann
KONA

Reply via email to