Re: [DISCUSS] SDB future

Simon Helsen Mon, 27 May 2013 07:50:48 -0700

Andy, others,

we do not use SDB because it is way too slow for us. Although I'm sure it 
can be improved as you suggested below, we do not believe it will ever 
come close to TDBs performance because of how SDB is designed. The fact it 
is used at all keeps surprising me, but it probably doesn't matter for 
simple use cases especially if the dataset remains small. Btw, a little 
while back we were reconsidering it because SDB supports multiple vendors 
and our use-case was not very performance sensitive, but it turned out 
that it was still too slow for our needs

As for motivations why some people may prefer SDB over TDB, I don't think 
it is just "SQL" and corporate acceptance. There are some very good 
reasons why file-based systems like TDB are difficult to use in commercial 
deployments. Corporate Java-based server deployments are almost always 
based on one or more app servers (JEE, but sometimes just a regular web 
server) where *all* persisted state goes into a relational database. 
Storing state on the file system is generally taboo for many reasons 
including: The inability to cluster the app server - a critical step to 
scale up beyond a few hundred users; the fact that organizations generally 
have a hard time understanding and managing file system state as opposed 
to standard relational database management. For instance, how do you 
perform online backups and when and how does the state corrupt. For 
instance, on a DB server, admins are watchful for running out of disk 
space. This sort of monitoring is usually less critical on the app server, 
but now, suddenly, there is this growing "thing" they have to backup 
because it will corrupt if you run out of disk space. On top of that, DB 
servers usually use very fast and expensive disk systems (RAIDs, SSDs, 
etc.) This is usually not the case for the app server. On top of that, 
when customers realize there is this large set of data on the file system, 
they have a tendency to put it on larger disks connected via NFS, which is 
unfortunately very dangerous because even short network glitches can 
corrupt TDB. All of this is manageable if you carefully encapsulate TDB 
and provide good administration tools on top of your system, but it is not 
trivial, and it doesn't come out of the box. Especially cluster management 
is quite tricky as you can imagine.

Just wanted to set that out there as to why SQL-based systems remain 
attractive. I would advise though to more clearly state on the SDB 
download page that SDB is deprecated and no longer actively supported 
(unless that changes of course)

Simon

From:
Andy Seaborne <[email protected]>
To:
[email protected], 
Date:
05/25/2013 06:19 AM
Subject:
[DISCUSS] SDB future

Yes - I'm conflicted as well, flip-flopping between opt1 and opt3.

There is enough user@ traffic to suggest it's used.  I'm guessing it's 
the "SQL" part makes it easier in corp IT. TDB is faster, scales better 
and is better supported (and I'm not corp IT bound).

There are ways to improve it's performance - pushing some filters into 
SQL for example - so theres scoep for development.

Option 1 - add to the main distribution, remove if it becomes a block - 
means that there is no additional work on a release vote.

Testing SDB using Derby only (that's what the junit does by default) is 
easy to setup because it's pulled in by maven.  It only runs embedded, 
not as a server but it does check the code generation.  Unliek other 
jjava SQL DBs, Derby implements tree join plans (the other onyl do 
linear join plans which makes some optional cases impossible - the code 
fall back to brute-force-and-ignorance in these cases). Derby is quite 
picky about it's SQL 92.  SDB tests without additional setup.

We state on users@ this is the position as a indication that we stil 
have option 3 (retirement) available.  Unless we shake the tree a bit,

Proposal: (option 1)

   add to apache-jena, remove at the first sign of trouble.
   make a clear statement of situation on users@ including
     encouraging people to come forward
   option 3 still on the cards.

I've added DISCUSS to the subject line for now to leave open the 
possibility of a vote because it affects the whole project.

But all PMC chipping in here is enough.

I don't see a rush to make a choice just yet.

                 Andy

On 22/05/13 11:47, Claude Warren wrote:
> I am conflicted about this one.  I think we need SDB (or something 
similar)
> that will allow users to use standard infrastructure (shared/pooled DB 
is
> fairly common).  But I don't have the bandwidth to support it.
>
> Is there a status where in we release SDB package with each release of 
Jena
> but only ensure that the current test cases work -- that is the latest
> release doesn't break something?  Perhaps with a reduced set of 
supported
> DBs (perhaps Derby & MySQL)
>
> If not then I think we take Andy's approach and release one more before
> putting it on the shelf.
>
>
> On Wed, May 22, 2013 at 10:35 AM, Charles Li 
<[email protected]>wrote:
>
>> What are alternatives to SDB? I have a 4GB RDF/XML to load for later
>> queries
>>
>> Thanks!
>> - Charles
>>
>> On May 21, 2013, at 12:00 PM, Stephen Owens 
<[email protected]>
>> wrote:
>>
>>> +1 for option 3 if no one currently is taking ownership of that 
project.
>> I think it's a useful signal to potential adopters about what they 
should
>> expect.
>>>
>>> On 2013-05-21, at 12:47 PM, Andy Seaborne <[email protected]> wrote:
>>>
>>>> SDB is getting some user attention but not much developer attention. 
I
>> was hoping that there would be a contribution to go with JENA-447 but
>> nothing has come in.  I don't have the bandwidth to even answer 
questions
>> about it properly, partly because I don't use it.  I guess others are 
in a
>> similar position.
>>>>
>>>> I do think we should be clear as to it's status.
>>>>
>>>> In the future, I see these options:
>>>>
>>>> 1/ Add jena-sdb to the main distribution.
>>>>   (If it becomes a block on a release, remove it.)
>>>>
>>>> 2/ As is - release "sometimes".
>>>>
>>>> 3/ Dormant SDB.
>>>>   This is the last release unless some activity arises to maintain 
it.
>>>>   Keep the source around but move out of trunk.
>>>>   Can be built from source.
>>>>
>>>> 4/ Legacy SDB.
>>>>   More definite statement than (3) that it is dropped.
>>>>   Keep the source around.
>>>>
>>>> For 3 and 4, where there are no plans to release again if nothing
>> changes, the snapshot builds should be stopped.  Users can build from
>> source if they want to but the current snapshot should not become a
>> distribution-under-the-radar which I feel it becomes if there are no 
plans
>> to make it a formal release.
>>>>
>>>> Thoughts?
>>>>
>>>> I'm tending towards doing this one last release then (3).
>>>>
>>>>    Andy
>>
>
>
>

Re: [DISCUSS] SDB future

Reply via email to