Re: Advertising Any23 to Jena

Andy Seaborne Sun, 18 Nov 2012 10:20:08 -0800

OK, thank you for making this explicit. I suppose my curiosity here
revolved around where we (as an Any23 community) could/want to get
involved in making Any23 a better framework and potentially a
dependency within the semantic web projects within the ASF.


  however I can't help but see/think that there are areas where we
(Any23 Jena) can find commonality.


It would be good.  Add Stanbol and the-project-née-Linda.

together with a new I/O architecture:


accepted 100%

which is now ready for migrating into the codebase (after a pause due RDF-WG
work and non-Apache time).


Now done ...


accepted 110%


In particular, the parser pipeline is have been heavily tuned to get load
performance for TDB.  (Long story to do with how Java I/O has hidden costs.)


Jena framework specific?


Yes and no.

"Yes" -- the parsers use Jena classes but very few.

"no" -- but only as carriers for triples and terms. Output is to aSink<Triple>, so that can be directly to a graph, a print stream, directto storage (TDB), a stream-filter, whatever.

The carrier objects are from Jena's SPI - AKA the graph API, which isjust Graph/Triple/Node/DatasetGraph/Quad (+datatypes).

ARP (the RDF/XML parser) does have it's own abstraction of nodes toisolate it from the rest of jena. Once upon a time it did runseparately (it still can but it's packaged with jena now). All the RIOTparsers are doing is using a zero-copy approach to the same thing.Churning objects during n-triples parsing is a measurable cost. TheRIOT N-triples parser does about 200K+ triples/s in ideal conditions [2].

The Jena API is built on the SPI - the API is much bigger than the SPIwhich is really quite small and could be smaller.


        Andy

[1]http://mail-archives.apache.org/mod_mbox/jena-dev/201207.mbox/%3C5009735B.5020908%40apache.org%3E

[2] ideal: server or workstation class PC not doing anything else at thetime. No other disk activity, no CPU activity. Materialise triples butsend to a Sink that throws everything away.

gzip vs raw expanded file makes a small difference - raw is faster, butthen very large NT files are often written all in one go so that arelaid out well on disk for the disk interface to stream and SSDs are notthat much faster if I/O is not random (I see < x2 faster for > x10 thecost mentioned, presumably the x10 is dropping)

PS the Turtle parser is compliant with the latest RDF 1.1 spec and the draft
RDF 1.1 Turtle test suite.


Do we have these implementations over @Any23?

So I suppose the underlying question/conversation/discussion I was
putting forward concerns where, how and if both projects can benefit?
We both (communities) have tried to have this before... however now as
the Scottish National Football team are non-existent, I really have
nothing to do...

I know this is not a trivial issue... however I hope we are moving in
the right direction.


Yes.

The negative side


  Lewis

Re: Advertising Any23 to Jena

Reply via email to