Re: Range API

Brian Goetz Thu, 26 Sep 2024 06:30:33 -0700

Sorry for the not-good news, but I'm not too surprised. Computationaldomains like "32 bit integers" seem like they should have a lot incommon with algebraic structures like groups and rings, but when youstart poking at them, the compromises we make to fit them into hardwareregisters start to bite. (And let's not get started on floatingpoint...) Lots of research into numeric towers in various languages, orcapturing fundamental properties in type classes like Haskell's `Eq` and`Ord`, offers plenty of compromise to go with its promise.

I think a big part of what you are running into is that you've startedwith a _concept_ (a deceptively simple one, at that), rather than_requirements_. And it is the open-endedness of this concept (discretevs continuous, bounded vs half-open, including endpoints or not, etc)that resists abstraction. Plus, without clear requirements, you will besubject to an endless barrage of "what about my pet use case" (e.g.,"what about the numbers zero to ten, advancing by two"). Meanwhile,domain-specific libraries such as java.time will invent their owndomain-specific answers, like Interval.

Rather than starting from the algebraic properties, perhaps start fromthe other end: what are the use cases where the lack of a rangeabstraction is problematic. I get that


    for (int i=0; i<100; i++) { ... }

is uglier and less abstract than

    for (int i : Range.of(0, 100)) { ... }

but I also don't sense people beating down the doors for that (even ifthe language had range literals, like `0..<100`).

Where I do see people having trouble is that many range computations areerror prone. For example, `String::indexOf` returns the starting indexof a match; if you want to actually iterate over the characters of sucha match, you have to do something like


    for (int j=index; j<index+target.length(); j++)

and you are at risk for fencepost errors when recreating the range. Whereas an indexOf method (under a more suitable name) that returned arange, would be more amenable to downstream processing. Similarly, Isee errors in API usage because sometimes we specify range by (start,end) and sometimes by (start, length), and since both are ints, we getno type checking when you pass the wrong kinds of ints to such a method.

But, the mere existence of a Range type would do little to help String,Arrays, and other range-happy APIs, because we would have to update themto include new overloads that dispense and consume ranges. So that's abig project.

Still, I think investigating use cases involving libraries that workintensively with ranges like this would likely yield useful informationfor what a Range type would want to provide.


HTH,
-Brian






On 9/26/2024 9:07 AM, Olexandr Rotan wrote:

Researching the of deriving some iterable representations from ranges,and I am not here with the good news.
Unlike range algebra and boolean operations, which generalizeextremely well, iterability of ranges... Well, it's safe to say itdoesn't generalize at all. Analyzing key features people expectiterable ranges to have, I ended up concluding there are basically twogroups / two use cases for them. First is plain and simple, arguablythe most popular one: iterating over a range of integer numbers, i.e.`for (i : Range.of(1, 10))`. Another use case is for more complexiterations over ranges of reference types, most commonly dates/time.
There are two groups of values by their nature: discrete andcontinuous. Most of the types belong to the second group, as there isno direct increment AND decrement for them (we will omit hardwarelimitations for simplicity), such as floating point values. What isthe increment of 1,3? 1.31 or 1.30000000001, or maybe something evenmore unreadable? On the other hand, the increment of LocalDate incontext of range iteration that represents today is rather obvious -it is tomorrow.
There is a pretty limited number of discrete types in jdk. Dates,whole numbers and basically that's it. The discrete types that are notpresent in jdk can be really various. For example, users can define acomparable type "F1Team" and compare them based on their position inthe last race. There, increment would most likely be the next team inrating. There are many domain-specific cases like this.
This is where the problem comes from. If the user would always have topass a comparator to create a range, it would be consistent to makethe user define increment/decrement as well. But we don't want usersto pass a comparator if the type is already comparable. Similarly, wedon't want users to define increment/decrement if there is already onein the language! I think defining increments for dates (sayLocalDate.plusDays(1)) would be acceptable, even defining incrementsfor floats in context of ranges might be acceptable, but making peopledefine increments for integers is, in my opinion, completely not.Besides performance impact, this is a terrible user experience.
There are a few solutions to this:
1) Define ton of overrides for factory methods and specialized typesfor this (uhh, sounds awful)2) Introduce new interface, say Discrete<T>, that defines Tincrement() (and possible T decrement()) methods. From now on, thereare 2 branches:2.1) Leave things as is, allow users to define incrementation logicfor their types, but don't touch integers and other built-ins.I seethis option as extremely inconsistent and not solving the main issue,which is iterability of integers.2.2) Retrofit (scary) existing types to implement this interface. Thisshould not have any compatibility nor security implications, but stillsneaking into java.lang every time we need some new API to be moreuser-friendly is obviously not a way to go. This basically comes downto a question about how deep we want to integrate ranges intolanguage, and is range generalization even worth the invasion into thecore of language (imo yes).3) Leave things as they are, just let users derive iterables usingsomething like range.asIterableWithStep(IncremetStartegy increment). Ithink this would make an API too narrow as no one will use it forroutine tasks the same way people do in Rust, Kotlin and other languages.
I would love to hear community opinion on this matter. Which option isthe most preferable, maybe some compromise between a few of them, ormaybe there is a better way to go that I didn't mention here?
Best regards
On Tue, Sep 24, 2024 at 5:11 PM Alan Snyder <javali...@cbfiddle.com>wrote:
    I have another example: I have a datatype that represents a region
    of an audio track, for example, one tune in a medley of tunes. I
    allow the region to
    specify both a start and end time, but the end time is optional
    (and mostly not used). When the end time is not specified, the
    region ends at the start of the next region, or at
    the end of the track if there is no next region. The latter case
    is useful because the exact track length may not be known. The
    optionality of the end time
    is not represented in the type system.

    Having said that, I’m not sure that a general abstract interface
    would be useful for this example.
    On Sep 24, 2024, at 2:13 AM, Olexandr Rotan
    <rotanolexandr...@gmail.com> wrote:

    As part of the redesigning  process , I am researching whether or
    not there are use cases that require asserting that the range is
    exactly half-bounded. This is important because I plan to switch
    to BoundedAtEnd/BoundedAtStart sealed interfaces instead of flags
    and runtime checks: Here is what I gathered for now.

      * *Date/Time Handling (Historical or Forecast Data)*: When
        dealing with events that started at a specific time but have
        no known end (e.g., open-ended employment contracts or
        ongoing subscriptions)
      * *Stream Processing (Real-time Event Streams)*: In real-time
        systems, you might process data that has a start time but no
        defined end, such as monitoring a live video feed or logging
        system. The range is bounded at the start and unbounded at
        the end as more data will continuously arrive.
      * *Data Pagination (Fetch Until Condition)*: When implementing
        pagination, sometimes you might want to fetch items starting
        from a specific index up to an unbounded limit (e.g.,
        fetching all items after a certain point until memory runs
        out or a condition is met).
      * *Auditing and Monitoring*: In systems where audit trails or
        logging data should capture all events after a certain point
        (bounded start) with no foreseeable end (unbounded end), such
        as monitoring changes to records in a database starting from
        a fixed timestamp.
      * *Scientific or Statistical Ranges*: When modeling physical
        systems or statistical ranges, you might want to capture
        measurements that begin at a known threshold but
        theoretically have no upper or lower bound. For example,
        recording temperature data starting at absolute zero and
        increasing without any known upper limit.
      * *Inventory or Resource Allocation*: Resource allocation
        policies, such as those for virtual machines, may be based on
        known minimum allocation thresholds but have flexible or
        unbounded resource caps, depending on availability.

        I am writing to ask whether anyone who worked with such
        systems could confirm/deny that those are real use cases. If
        so, would it be satisfying enough to assert one-way
        unboundness with instanceof checks, i.e. range instanceof
        UnboundedEndRange && !(range instanceof UnboundedStartRange).
        Would appreciate any feedback.

Re: Range API

Reply via email to