Changing the behavior of online tables instead of adding a new table state
seems reasonable.

One possible way to do this is that all tablets in an online table have a
default goal state of hosted.  A user can somehow define ranges of an
online table to load tablets ondemand as needed.

Could add apis to list, add, and remove these ranges for a table.  Or
instead of having APIs for this we could use per table config to define
these ranges.  Could have a property named table.ondemand.ranges where the
value is a list of row ranges.

The balancer and clients would be aware of these ranges and load on demand
tablets in the ranges as needed based on activity.


On Tue, Mar 28, 2023, 10:26 AM Christopher <ctubb...@apache.org> wrote:

> I think we should deprecate support for offline table scanning, since
> it shouldn't be needed with the availability of ScanServers. Any
> MapReduce that previously relied on scanning offline tables could be
> made to use that instead.
>
> I agree there is a need to have an immutable table state, for which it
> is possible to read, but no changes can be made. However, even in that
> "locked" state, one should still be able to perform surgery on its
> metadata, or manually / surgically compact files (with the
> understanding that doing so will interfere with any concurrent export
> or scan operations that are relying on it being immutable, which I
> think is a tolerable amount of risk, when actually in a situation
> where such surgery is needed).
>
> As for "ondemand" table state, from a user perspective, I'm not sure
> what it means... is the "on-demand availability" applicable only for
> live ingest / immediate consistency? Is it still "always available"
> for bulk import / ScanServers? Or does "on-demand availability"
> somehow apply to all interactions, including bulk import and
> ScanServer reads?
>
> I think the "ondemand" state is confusing, because it's exposing
> internal state through to the user, and in a way that isn't as clear
> as the simple "online/offline" states used to be. Previously, users
> didn't need to understand what was going on internally... "online"
> just meant "I can interact with this table", and "offline" meant "I
> can't interact with this table". The user wasn't required to
> understand what a tablet was, or how it was hosted, or anything of
> that nature. As we started adding support for "offline" features, the
> lines separating "online and offline" meaning "available and
> unavailable" became blurred. As we proceed adding elasticity, I think
> we should work to make things more clear and explicit again... and I
> think "ondemand" as a table state, makes things even less clear when
> the concept is exposed to the user as a separate table state.
>
> I do think we need some kind of on-demand availability for live-ingest
> and immediate consistency in order to be more elastic, and from the
> discussion, it's obvious we need an immutable table state, but I think
> it's a mistake to expose the on-demand availability for live-ingest
> and immediate consistency as a new table state. I think that should be
> left as either some kind of automatic internal behavior, or as a
> secondary fine-grained control over an online table (like pinned
> tablets, either permanently pinned or temporally pinned, based on
> activity).
>
> On Tue, Mar 28, 2023 at 9:51 AM Drew Farris <d...@apache.org> wrote:
> >
> > On Mon, Mar 27, 2023 at 2:16 PM Keith Turner <ke...@deenlo.com> wrote:
> >
> > > One realization that came out examining the different table states is
> > > that export table currently relies on the fact that offline tables
> > > will not delete files.  If we enable compactions on offline tables
> > > then that could cause files to be deleted which would break the
> > > expectation of export table.
> > >
> >
> > This is a good point. I hadn't considered the potential breakage to
> export
> > table. I suspect another concern could be the hadoop input format that
> > operates over the rfiles in an offline table - and can do so relatively
> > safely
> > because the table is not expected to change while it is offline.
> >
> > So, it would seem that there is value in having an 'immutable' table
> state
> > in
> > the form of an offline table. Perhaps 'ondemand' is the alternate state
> > that
> > lets us do things like import, split, compact, merge, etc.
>

Reply via email to