In that case, I think it's probably sufficient to let the users know the risks of bulk importing and never bringing it online for compactions. It seems like that's a risk some users might be okay with for their use case.
On Thu, Mar 23, 2023, 19:38 Dave Marion <dmario...@gmail.com> wrote: > Yes, if the table is never brought online. I believe that Keith said that > the table could still be scanned when offline with existing MapReduce code > or the OfflineScanner, which presents an issue that is not currently > handled. I think we discussed today that the same thing could be achieved > with tables in the on demand state. The reason to not modify an offline > table is the export case, where the table needs to be immutable until the > files are copied. > > On Thu, Mar 23, 2023, 6:58 PM Christopher <ctubb...@apache.org> wrote: > >> What do you mean by "when not used in this manner"? What other way is >> there to use that feature? Do you mean simply never being brought >> online? >> >> Would it be possible to support (external) compactions for an offline >> table? >> >> I feel like that's a pretty useful feature to revert, and would want >> to consider alternatives. >> >> On Thu, Mar 23, 2023 at 6:39 PM Dave Marion <dmario...@gmail.com> wrote: >> > >> > Keith and I had a discussion today (that included some user input) >> > regarding table operations with the new OnDemand table concept. I have >> put >> > the notes up on the wiki at: >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=247828052 >> . >> > One thing that came out of that is that we may want to revert the >> change in >> > the new bulk import code that allows a user to import into an offline >> > table. The feature allows a user to create a table that is initially >> > offline, bulk import data into it, then bring it online. However, when >> not >> > used in this manner the number of bulk import files would continue to >> grow >> > because compactions are never run on the table. >> > >> > On Mon, Mar 20, 2023 at 9:37 AM Dave Marion <dmario...@gmail.com> >> wrote: >> > >> > > Following up on this. Discussion and design documents are up on the >> > > wiki[1]. There is a GitHub project[2] for planning out some of the >> tasks, >> > > which are then turned into issues. Some of the issues have draft PRs >> > > submitted for them. >> > > >> > > [1] https://cwiki.apache.org/confluence/display/ACCUMULO/Elasticity >> > > [2] https://github.com/orgs/apache/projects/164 >> > > >> > > On Wed, Feb 22, 2023 at 2:35 PM Dave Marion <dmario...@gmail.com> >> wrote: >> > > >> > >> Except for the new bulk import code, Accumulo requires that tables >> are in >> > >> an online state to work with them (ingest, scan, compact, split, >> etc.). In >> > >> some cases this could become cost prohibitive and resource >> inefficient as >> > >> resources necessary to keep the tables online might be unused. I'd >> like to >> > >> propose a new capability for Accumulo - the ability to work with >> tables >> > >> that are not online. This could either mean working with tables in an >> > >> offline state, or maybe the ability to assign/host tables/tablets on >> > >> demand. >> > >> >> > >> At a high level the two ideas currently being discussed are below. I >> > >> think in both cases the root and metadata tables must be online, >> table >> > >> management functions move to manager components, and compactions of >> offline >> > >> tables move to the external compaction processes. In addition, new >> metrics >> > >> would need to be emitted so that an external resource scheduler >> could spin >> > >> up/down server processes as demand changes. >> > >> >> > >> >> > >> *Offline Operations* >> > >> >> > >> This approach allows all operations to occur on offline tables at the >> > >> cost of having eventual consistency to the data at scan time (via >> Scan >> > >> Servers only). Live ingest could be supported through the creation >> of an >> > >> ingest server component that just receives mutations and minor >> compacts. >> > >> >> > >> >> > >> >> > >> *On-demand Tables* >> > >> This approach allows for user tables to be offline and un-hosted, but >> > >> hosts them on demand for the purpose of live ingest and immediate >> scans at >> > >> the latency cost of possibly assigning and hosting the tablet. >> > >> >> > >> We have a few releases (1.10.3, 2.1.1, and 3.0.0) coming up in >> likely the >> > >> next month or two, but after that I'd like to start implementing >> something >> > >> to address this. Please contribute to the discussion if you have >> thoughts >> > >> on requirements, design, etc. >> > >> >> > >> >> > >> >> > >> >> >