Re: [DISCUSS] SDB future

Rob Vesse Tue, 28 May 2013 10:23:47 -0700

Personally I would vote for Option 3, none of us are maintaining it nor do
we have the time/interest to do so.  To continue to put out releases
implies a level of support that does not really exist.


Over in dotNetRDF we long ago abandoned any use of SQL backed storage
because it just doesn't scale as well as a native triple store.

I understand the points about why people may prefer SDB over TDB but
whatever we choose to do we should start making it clear to people that
SDB is deprecated/dormant and we recommend they investigate other
alternatives.

Let's also be clear that TDB is not the only alternative it is just the
one that comes with Jena.  Most of the big triple store vendors have Jena
providers that will let you access commercially developed triple stores
via Jena (e.g. Virtuoso, Stardog, OWLIM, Oracle, IBM DB2 etc).  Many of
these commercial stores include the type of clustering, replication and
failover capabilities that Simon is highlighting as being desirable in
highly scalable production scenarios.

Rob



On 5/27/13 7:43 AM, "Simon Helsen" <[email protected]> wrote:

>Andy, others,
>
>we do not use SDB because it is way too slow for us. Although I'm sure it
>can be improved as you suggested below, we do not believe it will ever
>come close to TDBs performance because of how SDB is designed. The fact
>it 
>is used at all keeps surprising me, but it probably doesn't matter for
>simple use cases especially if the dataset remains small. Btw, a little
>while back we were reconsidering it because SDB supports multiple vendors
>and our use-case was not very performance sensitive, but it turned out
>that it was still too slow for our needs
>
>As for motivations why some people may prefer SDB over TDB, I don't think
>it is just "SQL" and corporate acceptance. There are some very good
>reasons why file-based systems like TDB are difficult to use in
>commercial 
>deployments. Corporate Java-based server deployments are almost always
>based on one or more app servers (JEE, but sometimes just a regular web
>server) where *all* persisted state goes into a relational database.
>Storing state on the file system is generally taboo for many reasons
>including: The inability to cluster the app server - a critical step to
>scale up beyond a few hundred users; the fact that organizations
>generally 
>have a hard time understanding and managing file system state as opposed
>to standard relational database management. For instance, how do you
>perform online backups and when and how does the state corrupt. For
>instance, on a DB server, admins are watchful for running out of disk
>space. This sort of monitoring is usually less critical on the app
>server, 
>but now, suddenly, there is this growing "thing" they have to backup
>because it will corrupt if you run out of disk space. On top of that, DB
>servers usually use very fast and expensive disk systems (RAIDs, SSDs,
>etc.) This is usually not the case for the app server. On top of that,
>when customers realize there is this large set of data on the file
>system, 
>they have a tendency to put it on larger disks connected via NFS, which
>is 
>unfortunately very dangerous because even short network glitches can
>corrupt TDB. All of this is manageable if you carefully encapsulate TDB
>and provide good administration tools on top of your system, but it is
>not 
>trivial, and it doesn't come out of the box. Especially cluster
>management 
>is quite tricky as you can imagine.
>
>Just wanted to set that out there as to why SQL-based systems remain
>attractive. I would advise though to more clearly state on the SDB
>download page that SDB is deprecated and no longer actively supported
>(unless that changes of course)
>
>Simon
>
>
>
>From:
>Andy Seaborne <[email protected]>
>To:
>[email protected],
>Date:
>05/25/2013 06:19 AM
>Subject:
>[DISCUSS] SDB future
>
>
>
>Yes - I'm conflicted as well, flip-flopping between opt1 and opt3.
>
>There is enough user@ traffic to suggest it's used.  I'm guessing it's
>the "SQL" part makes it easier in corp IT. TDB is faster, scales better
>and is better supported (and I'm not corp IT bound).
>
>There are ways to improve it's performance - pushing some filters into
>SQL for example - so theres scoep for development.
>
>Option 1 - add to the main distribution, remove if it becomes a block -
>means that there is no additional work on a release vote.
>
>Testing SDB using Derby only (that's what the junit does by default) is
>easy to setup because it's pulled in by maven.  It only runs embedded,
>not as a server but it does check the code generation.  Unliek other
>jjava SQL DBs, Derby implements tree join plans (the other onyl do
>linear join plans which makes some optional cases impossible - the code
>fall back to brute-force-and-ignorance in these cases). Derby is quite
>picky about it's SQL 92.  SDB tests without additional setup.
>
>We state on users@ this is the position as a indication that we stil
>have option 3 (retirement) available.  Unless we shake the tree a bit,
>
>Proposal: (option 1)
>
>   add to apache-jena, remove at the first sign of trouble.
>   make a clear statement of situation on users@ including
>     encouraging people to come forward
>   option 3 still on the cards.
>
>I've added DISCUSS to the subject line for now to leave open the
>possibility of a vote because it affects the whole project.
>
>But all PMC chipping in here is enough.
>
>I don't see a rush to make a choice just yet.
>
>                 Andy
>
>On 22/05/13 11:47, Claude Warren wrote:
>> I am conflicted about this one.  I think we need SDB (or something
>similar)
>> that will allow users to use standard infrastructure (shared/pooled DB
>is
>> fairly common).  But I don't have the bandwidth to support it.
>>
>> Is there a status where in we release SDB package with each release of
>Jena
>> but only ensure that the current test cases work -- that is the latest
>> release doesn't break something?  Perhaps with a reduced set of
>supported
>> DBs (perhaps Derby & MySQL)
>>
>> If not then I think we take Andy's approach and release one more before
>> putting it on the shelf.
>>
>>
>> On Wed, May 22, 2013 at 10:35 AM, Charles Li
><[email protected]>wrote:
>>
>>> What are alternatives to SDB? I have a 4GB RDF/XML to load for later
>>> queries
>>>
>>> Thanks!
>>> - Charles
>>>
>>> On May 21, 2013, at 12:00 PM, Stephen Owens
><[email protected]>
>>> wrote:
>>>
>>>> +1 for option 3 if no one currently is taking ownership of that
>project.
>>> I think it's a useful signal to potential adopters about what they
>should
>>> expect.
>>>>
>>>> On 2013-05-21, at 12:47 PM, Andy Seaborne <[email protected]> wrote:
>>>>
>>>>> SDB is getting some user attention but not much developer attention.
>I
>>> was hoping that there would be a contribution to go with JENA-447 but
>>> nothing has come in.  I don't have the bandwidth to even answer
>questions
>>> about it properly, partly because I don't use it.  I guess others are
>in a
>>> similar position.
>>>>>
>>>>> I do think we should be clear as to it's status.
>>>>>
>>>>> In the future, I see these options:
>>>>>
>>>>> 1/ Add jena-sdb to the main distribution.
>>>>>   (If it becomes a block on a release, remove it.)
>>>>>
>>>>> 2/ As is - release "sometimes".
>>>>>
>>>>> 3/ Dormant SDB.
>>>>>   This is the last release unless some activity arises to maintain
>it.
>>>>>   Keep the source around but move out of trunk.
>>>>>   Can be built from source.
>>>>>
>>>>> 4/ Legacy SDB.
>>>>>   More definite statement than (3) that it is dropped.
>>>>>   Keep the source around.
>>>>>
>>>>> For 3 and 4, where there are no plans to release again if nothing
>>> changes, the snapshot builds should be stopped.  Users can build from
>>> source if they want to but the current snapshot should not become a
>>> distribution-under-the-radar which I feel it becomes if there are no
>plans
>>> to make it a formal release.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> I'm tending towards doing this one last release then (3).
>>>>>
>>>>>    Andy
>>>
>>
>>
>>
>
>
>

Re: [DISCUSS] SDB future

Reply via email to