Re: [DISCUSS] SDB future

Andy Seaborne Wed, 05 Jun 2013 05:14:22 -0700

On 29/05/13 03:27, Holger Knublauch wrote:

At TopQuadrant we do use and recommend SDB for enterprise solutions,
mainly due to the fact that customers can rely on their existing SQL
infrastructure. Performance is not a direct issue for us because we
apply a caching layer on top of it (currently a home-grown in-memory
cache, but in the future possibly TDB). We do have a growing
installation base of successful deployments (TopBraid EVN) and customers
seem to be happy. Having to rely on commercial alternatives would affect
the overall price tag of our solutions, and having an open source
solution that seamlessly works with Jena is a great asset.

What is cached? Triples? or graphs? or parts of graphs? (I'm wonderingif GSP (SPARQL Graph Store Protocol) is an option - if graphs are theunit, and SDB is a graph date management layer,


GSP over SQL is really very simple and may be a useful tool in the toolbox.

(or indeed over NoSQL KV-store)

I have brought up the topic of this thread in our management to see
whether we can allocate any resources to its future, but I cannot report
any decision at this stage.

Interesting - do let us know how that goes. TQ have also mentionedextensions to SDB, for example multiple datasets in one SQL database.


        Andy


Thanks,
Holger



On 5/28/2013 0:43, Simon Helsen wrote:

Andy, others,

we do not use SDB because it is way too slow for us. Although I'm sure it
can be improved as you suggested below, we do not believe it will ever
come close to TDBs performance because of how SDB is designed. The
fact it
is used at all keeps surprising me, but it probably doesn't matter for
simple use cases especially if the dataset remains small. Btw, a little
while back we were reconsidering it because SDB supports multiple vendors
and our use-case was not very performance sensitive, but it turned out
that it was still too slow for our needs

As for motivations why some people may prefer SDB over TDB, I don't think
it is just "SQL" and corporate acceptance. There are some very good
reasons why file-based systems like TDB are difficult to use in
commercial
deployments. Corporate Java-based server deployments are almost always
based on one or more app servers (JEE, but sometimes just a regular web
server) where *all* persisted state goes into a relational database.
Storing state on the file system is generally taboo for many reasons
including: The inability to cluster the app server - a critical step to
scale up beyond a few hundred users; the fact that organizations
generally
have a hard time understanding and managing file system state as opposed
to standard relational database management. For instance, how do you
perform online backups and when and how does the state corrupt. For
instance, on a DB server, admins are watchful for running out of disk
space. This sort of monitoring is usually less critical on the app
server,
but now, suddenly, there is this growing "thing" they have to backup
because it will corrupt if you run out of disk space. On top of that, DB
servers usually use very fast and expensive disk systems (RAIDs, SSDs,
etc.) This is usually not the case for the app server. On top of that,
when customers realize there is this large set of data on the file
system,
they have a tendency to put it on larger disks connected via NFS,
which is
unfortunately very dangerous because even short network glitches can
corrupt TDB. All of this is manageable if you carefully encapsulate TDB
and provide good administration tools on top of your system, but it is
not
trivial, and it doesn't come out of the box. Especially cluster
management
is quite tricky as you can imagine.

Just wanted to set that out there as to why SQL-based systems remain
attractive. I would advise though to more clearly state on the SDB
download page that SDB is deprecated and no longer actively supported
(unless that changes of course)

Simon



From:
Andy Seaborne <[email protected]>
To:
[email protected],
Date:
05/25/2013 06:19 AM
Subject:
[DISCUSS] SDB future



Yes - I'm conflicted as well, flip-flopping between opt1 and opt3.

There is enough user@ traffic to suggest it's used.  I'm guessing it's
the "SQL" part makes it easier in corp IT. TDB is faster, scales better
and is better supported (and I'm not corp IT bound).

There are ways to improve it's performance - pushing some filters into
SQL for example - so theres scoep for development.

Option 1 - add to the main distribution, remove if it becomes a block -
means that there is no additional work on a release vote.

Testing SDB using Derby only (that's what the junit does by default) is
easy to setup because it's pulled in by maven.  It only runs embedded,
not as a server but it does check the code generation.  Unliek other
jjava SQL DBs, Derby implements tree join plans (the other onyl do
linear join plans which makes some optional cases impossible - the code
fall back to brute-force-and-ignorance in these cases). Derby is quite
picky about it's SQL 92.  SDB tests without additional setup.

We state on users@ this is the position as a indication that we stil
have option 3 (retirement) available.  Unless we shake the tree a bit,

Proposal: (option 1)

    add to apache-jena, remove at the first sign of trouble.
    make a clear statement of situation on users@ including
      encouraging people to come forward
    option 3 still on the cards.

I've added DISCUSS to the subject line for now to leave open the
possibility of a vote because it affects the whole project.

But all PMC chipping in here is enough.

I don't see a rush to make a choice just yet.

                  Andy

On 22/05/13 11:47, Claude Warren wrote:

I am conflicted about this one.  I think we need SDB (or something

similar)

that will allow users to use standard infrastructure (shared/pooled DB

is

fairly common).  But I don't have the bandwidth to support it.

Is there a status where in we release SDB package with each release of

Jena

but only ensure that the current test cases work -- that is the latest
release doesn't break something?  Perhaps with a reduced set of

supported

DBs (perhaps Derby & MySQL)

If not then I think we take Andy's approach and release one more before
putting it on the shelf.


On Wed, May 22, 2013 at 10:35 AM, Charles Li

<[email protected]>wrote:

What are alternatives to SDB? I have a 4GB RDF/XML to load for later
queries

Thanks!
- Charles

On May 21, 2013, at 12:00 PM, Stephen Owens

<[email protected]>

wrote:

+1 for option 3 if no one currently is taking ownership of that

project.

I think it's a useful signal to potential adopters about what they

should

expect.

On 2013-05-21, at 12:47 PM, Andy Seaborne <[email protected]> wrote:

SDB is getting some user attention but not much developer attention.

was hoping that there would be a contribution to go with JENA-447 but
nothing has come in.  I don't have the bandwidth to even answer

questions

about it properly, partly because I don't use it.  I guess others are

in a

similar position.

I do think we should be clear as to it's status.

In the future, I see these options:

1/ Add jena-sdb to the main distribution.
   (If it becomes a block on a release, remove it.)

2/ As is - release "sometimes".

3/ Dormant SDB.
   This is the last release unless some activity arises to maintain

it.

   Keep the source around but move out of trunk.
   Can be built from source.

4/ Legacy SDB.
   More definite statement than (3) that it is dropped.
   Keep the source around.

For 3 and 4, where there are no plans to release again if nothing

changes, the snapshot builds should be stopped.  Users can build from
source if they want to but the current snapshot should not become a
distribution-under-the-radar which I feel it becomes if there are no

plans

to make it a formal release.

Thoughts?

I'm tending towards doing this one last release then (3).

    Andy

Re: [DISCUSS] SDB future

Reply via email to