Except for the new bulk import code, Accumulo requires that tables are in an online state to work with them (ingest, scan, compact, split, etc.). In some cases this could become cost prohibitive and resource inefficient as resources necessary to keep the tables online might be unused. I'd like to propose a new capability for Accumulo - the ability to work with tables that are not online. This could either mean working with tables in an offline state, or maybe the ability to assign/host tables/tablets on demand.
At a high level the two ideas currently being discussed are below. I think in both cases the root and metadata tables must be online, table management functions move to manager components, and compactions of offline tables move to the external compaction processes. In addition, new metrics would need to be emitted so that an external resource scheduler could spin up/down server processes as demand changes. *Offline Operations* This approach allows all operations to occur on offline tables at the cost of having eventual consistency to the data at scan time (via Scan Servers only). Live ingest could be supported through the creation of an ingest server component that just receives mutations and minor compacts. *On-demand Tables* This approach allows for user tables to be offline and un-hosted, but hosts them on demand for the purpose of live ingest and immediate scans at the latency cost of possibly assigning and hosting the tablet. We have a few releases (1.10.3, 2.1.1, and 3.0.0) coming up in likely the next month or two, but after that I'd like to start implementing something to address this. Please contribute to the discussion if you have thoughts on requirements, design, etc.