Re: [DISCUSS] SDB future

Andy Seaborne Wed, 05 Jun 2013 03:34:34 -0700


On 28/05/13 18:22, Rob Vesse wrote:

Personally I would vote for Option 3, none of us are maintaining it nor do
we have the time/interest to do so.  To continue to put out releases
implies a level of support that does not really exist.

Over in dotNetRDF we long ago abandoned any use of SQL backed storage
because it just doesn't scale as well as a native triple store.

I understand the points about why people may prefer SDB over TDB but
whatever we choose to do we should start making it clear to people that
SDB is deprecated/dormant and we recommend they investigate other
alternatives.

Let's also be clear that TDB is not the only alternative it is just the
one that comes with Jena.  Most of the big triple store vendors have Jena
providers that will let you access commercially developed triple stores
via Jena (e.g. Virtuoso, Stardog, OWLIM, Oracle, IBM DB2 etc).  Many of
these commercial stores include the type of clustering, replication and
failover capabilities that Simon is highlighting as being desirable in
highly scalable production scenarios.

Good point that there is Jena-as-client, able to work with any SPARQLcompliant system (that's what standards are for, right?!),Jena-as-framework (additional storage options) andJena-as-database-provider + Jena-as-SPARQL server. This may not comeacross clearly enough - if not, what can we do to fix that?


        Andy


Rob



On 5/27/13 7:43 AM, "Simon Helsen" <[email protected]> wrote:

Andy, others,

we do not use SDB because it is way too slow for us. Although I'm sure it
can be improved as you suggested below, we do not believe it will ever
come close to TDBs performance because of how SDB is designed. The fact
it
is used at all keeps surprising me, but it probably doesn't matter for
simple use cases especially if the dataset remains small. Btw, a little
while back we were reconsidering it because SDB supports multiple vendors
and our use-case was not very performance sensitive, but it turned out
that it was still too slow for our needs

As for motivations why some people may prefer SDB over TDB, I don't think
it is just "SQL" and corporate acceptance. There are some very good
reasons why file-based systems like TDB are difficult to use in
commercial
deployments. Corporate Java-based server deployments are almost always
based on one or more app servers (JEE, but sometimes just a regular web
server) where *all* persisted state goes into a relational database.
Storing state on the file system is generally taboo for many reasons
including: The inability to cluster the app server - a critical step to
scale up beyond a few hundred users; the fact that organizations
generally
have a hard time understanding and managing file system state as opposed
to standard relational database management. For instance, how do you
perform online backups and when and how does the state corrupt. For
instance, on a DB server, admins are watchful for running out of disk
space. This sort of monitoring is usually less critical on the app
server,
but now, suddenly, there is this growing "thing" they have to backup
because it will corrupt if you run out of disk space. On top of that, DB
servers usually use very fast and expensive disk systems (RAIDs, SSDs,
etc.) This is usually not the case for the app server. On top of that,
when customers realize there is this large set of data on the file
system,
they have a tendency to put it on larger disks connected via NFS, which
is
unfortunately very dangerous because even short network glitches can
corrupt TDB. All of this is manageable if you carefully encapsulate TDB
and provide good administration tools on top of your system, but it is
not
trivial, and it doesn't come out of the box. Especially cluster
management
is quite tricky as you can imagine.

Just wanted to set that out there as to why SQL-based systems remain
attractive. I would advise though to more clearly state on the SDB
download page that SDB is deprecated and no longer actively supported
(unless that changes of course)

Simon



From:
Andy Seaborne <[email protected]>
To:
[email protected],
Date:
05/25/2013 06:19 AM
Subject:
[DISCUSS] SDB future



Yes - I'm conflicted as well, flip-flopping between opt1 and opt3.

There is enough user@ traffic to suggest it's used.  I'm guessing it's
the "SQL" part makes it easier in corp IT. TDB is faster, scales better
and is better supported (and I'm not corp IT bound).

There are ways to improve it's performance - pushing some filters into
SQL for example - so theres scoep for development.

Option 1 - add to the main distribution, remove if it becomes a block -
means that there is no additional work on a release vote.

Testing SDB using Derby only (that's what the junit does by default) is
easy to setup because it's pulled in by maven.  It only runs embedded,
not as a server but it does check the code generation.  Unliek other
jjava SQL DBs, Derby implements tree join plans (the other onyl do
linear join plans which makes some optional cases impossible - the code
fall back to brute-force-and-ignorance in these cases). Derby is quite
picky about it's SQL 92.  SDB tests without additional setup.

We state on users@ this is the position as a indication that we stil
have option 3 (retirement) available.  Unless we shake the tree a bit,

Proposal: (option 1)

   add to apache-jena, remove at the first sign of trouble.
   make a clear statement of situation on users@ including
     encouraging people to come forward
   option 3 still on the cards.

I've added DISCUSS to the subject line for now to leave open the
possibility of a vote because it affects the whole project.

But all PMC chipping in here is enough.

I don't see a rush to make a choice just yet.

                 Andy

On 22/05/13 11:47, Claude Warren wrote:

I am conflicted about this one.  I think we need SDB (or something

similar)

that will allow users to use standard infrastructure (shared/pooled DB

is

fairly common).  But I don't have the bandwidth to support it.

Is there a status where in we release SDB package with each release of

Jena

but only ensure that the current test cases work -- that is the latest
release doesn't break something?  Perhaps with a reduced set of

supported

DBs (perhaps Derby & MySQL)

If not then I think we take Andy's approach and release one more before
putting it on the shelf.


On Wed, May 22, 2013 at 10:35 AM, Charles Li

<[email protected]>wrote:

What are alternatives to SDB? I have a 4GB RDF/XML to load for later
queries

Thanks!
- Charles

On May 21, 2013, at 12:00 PM, Stephen Owens

<[email protected]>

wrote:

+1 for option 3 if no one currently is taking ownership of that

project.

I think it's a useful signal to potential adopters about what they

should

expect.


On 2013-05-21, at 12:47 PM, Andy Seaborne <[email protected]> wrote:

SDB is getting some user attention but not much developer attention.

was hoping that there would be a contribution to go with JENA-447 but
nothing has come in.  I don't have the bandwidth to even answer

questions

about it properly, partly because I don't use it.  I guess others are

in a

similar position.


I do think we should be clear as to it's status.

In the future, I see these options:

1/ Add jena-sdb to the main distribution.
   (If it becomes a block on a release, remove it.)

2/ As is - release "sometimes".

3/ Dormant SDB.
   This is the last release unless some activity arises to maintain

it.

   Keep the source around but move out of trunk.
   Can be built from source.

4/ Legacy SDB.
   More definite statement than (3) that it is dropped.
   Keep the source around.

For 3 and 4, where there are no plans to release again if nothing

changes, the snapshot builds should be stopped.  Users can build from
source if they want to but the current snapshot should not become a
distribution-under-the-radar which I feel it becomes if there are no

plans

to make it a formal release.


Thoughts?

I'm tending towards doing this one last release then (3).

    Andy

Re: [DISCUSS] SDB future

Reply via email to