Except for the new bulk import code, Accumulo requires that tables are in
an online state to work with them (ingest, scan, compact, split, etc.). In
some cases this could become cost prohibitive and resource inefficient as
resources necessary to keep the tables online might be unused. I'd like to
propose a new capability for Accumulo - the ability to work with tables
that are not online. This could either mean working with tables in an
offline state, or maybe the ability to assign/host tables/tablets on
demand.

At a high level the two ideas currently being discussed are below. I think
in both cases the root and metadata tables must be online, table management
functions move to manager components, and compactions of offline tables
move to the external compaction processes. In addition, new metrics would
need to be emitted so that an external resource scheduler could spin
up/down server processes as demand changes.


*Offline Operations*

This approach allows all operations to occur on offline tables at the cost
of having eventual consistency to the data at scan time (via Scan Servers
only). Live ingest could be supported through the creation of an ingest
server component that just receives mutations and minor compacts.



*On-demand Tables*
This approach allows for user tables to be offline and un-hosted, but hosts
them on demand for the purpose of live ingest and immediate scans at the
latency cost of possibly assigning and hosting the tablet.

We have a few releases (1.10.3, 2.1.1, and 3.0.0) coming up in likely the
next month or two, but after that I'd like to start implementing something
to address this. Please contribute to the discussion if you have thoughts
on requirements, design, etc.

Reply via email to