Re: proposal to change return type for size() in graph

Rob Vesse Wed, 06 Nov 2013 04:43:33 -0800

Comments inline:

On 06/11/2013 10:48, "Andy Seaborne" <[email protected]> wrote:


>We release as a whole so all modules changing at once is do able for us.
>
>External implementation don't seem to track versions very closely (years
>of difference) so all this deprecation cycle stuff can only work on a
>very long timescale.  Also, they don't allow drop-in later versions of
>Jena onto old versions of their implementation, which is the killer for
>smooth changes.
>
>So one option is just make the change.  Any smoothed transition is not,
>in practice, helping anyone.

I agree, see my later comments on package rename but it would be nice to
just change the API on the Jena 3 branch and leave those who want to stick
with Jena 2 to lag behind as they will.  Moving to Jena 3 potentially
allows us to ignore niceties like deprecation cycles and just simply
remove/change stuff as necessary.  To aid transition we can always mark
things as deprecated on the Jena 2 branch with notes that the API is
changing in Jena 3.

>
>Or a transition might be:
>
>We could @Deprecate int/size(), make it return Integer.MAX_VALUE meaning
>"go ask another method" and add long/size2() that returns the proper
>answer.  GraphBase implements a not-preferred version where size() calls
>size2().
>
>We switch all out code to use size2() [1].
>
>This is still an interface incompatibility but possibly smoother.
>People using GraphBase have to recompile as they change version of Jena
>(or maybe not - all the right methods exist and don't change).
>
>"Possibly" because of the long lag on versions we see anyway.  Other
>changes, and we have to have the scope to make other changes somehow, do
>sufficiently frequently stop drop-in upgrade to old systems.
>
>Or.
>
>Jena3.  Interface spring cleaning.  Other changes.

+1

>
>The data change around xsd:string which might warrant Jena3.
>
>I want to avoid getting into in a long trough for Jena3 so I'm looking
>for how we'd get out of the change phase rather than just how to get
>into it.

A first pass for Jena 3 would literally be package rename, obvious
interface changes like this one and then push out an initial release.

>
>Maybe we start running two codebases in parallel for a while, Jena2
>being "maintenance only". If we delay package renaming for a while, it's
>quite easy to roll J3 fixes back into J2.

+1

-1 to delaying package renaming since I feel that makes things trickier
than they need to be and doesn't help version laggers if they pick up
3.0.0 and the APIs are virtually the same and then 3.1.0 changes all the
package names.

Back-porting to Jena 2 will probably mostly just require a Find/Replace on
com.hp.hpl.jena to org.apache.jena so I don't see this as a reason to
delay the package rename if we're going to do Jena 3 anyway.

Moving our source control to git would make maintaining parallel branches
and back porting changes much easier.  We can then take advantage of
things like git cherry-pick to aid back porting bug fixes from Jena 3 to
Jena 2.  So I would suggest we proceed to move to git and set up
appropriate branches for this workflow.

Rob

>
>Of course, we have the version-lag to take into account.
>
>JIRA is a good place to collect ideas and thoughts:
>
>JENA-189 (Jena3/technical)
>JENA-193 (RDF 1.1)
>
>Other JIRA include:
>
>JENA-190 (delivery)
>JENA-191 (module structure)
>JENA-192 (package naming)
>
>       Andy
>
>
>PS Not a double please - a long is large enough and doubles have less
>precision.  2^63-1 really is a very large number - 8 exa-triples.  And
>in java8 2^64-1 (sortof).
>
>[1] Eclipse will do it all in on click.
>
>On 06/11/13 08:53, Claude Warren wrote:
>> ON further consideration, perhaps sizeEstimate could return a Numeric
>> Literal Node.  This would provide the ability to return very large
>>numbers
>> as doubles and smaller numbers as ints and we already have the code to
>> convert those values to primitive numbers or Number instances.
>>
>>
>> On Wed, Nov 6, 2013 at 7:32 AM, Claude Warren <[email protected]> wrote:
>>
>>> I don't see how to transition unless we change the method name to
>>> something like sizeEstimate and return a double.  I think in most cases
>>> size is used to determine which side of a join should go on the left
>>>for
>>> efficiency and for unit tests.  We might want to return a statistical
>>> answer X +/- Y (sort of like the delta in the junit
>>> assert.equals(double,double,delta) tests )  But this is probably
>>>stretching
>>> a bit too far.
>>>
>>> Claude
>>>
>>>
>>> On Tue, Nov 5, 2013 at 10:28 PM, Andy Seaborne <[email protected]> wrote:
>>>
>>>> On 04/11/13 12:22, Claude Warren wrote:
>>>>
>>>>> Currently graph.size() returns an int.  the maximum value for an int
>>>>> is  2,147,483,647 (2.1 billion) though the model.size() returns a
>>>>>long.
>>>>>
>>>>> Does it make sense to change the return type for graph.size() to
>>>>>long?
>>>>>
>>>>> If not and a graph exceeds 2.1B triples should size just return
>>>>> Integer.MAX_VALUE.
>>>>>
>>>>> I ask as I am currently working on a project to load all of DBPedia
>>>>>(2.46
>>>>> billion triples) into a graph.
>>>>>
>>>>> Claude
>>>>>
>>>>>
>>>> Good idea.
>>>>
>>>> How would you see the change being made? (any transition process?)
>>>>
>>>>          Andy
>>>>
>>>>
>>>
>>>
>>> --
>>> I like: Like Like - The likeliest place on the
>>>web<http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>
>>
>>
>>
>

Re: proposal to change return type for size() in graph

Reply via email to