Re: Dynamic Scaling of Accumulo

Christopher Thu, 23 Mar 2023 17:15:40 -0700

In that case, I think it's probably sufficient to let the users know the
risks of bulk importing and never bringing it online for compactions. It
seems like that's a risk some users might be okay with for their use case.


On Thu, Mar 23, 2023, 19:38 Dave Marion <dmario...@gmail.com> wrote:

> Yes, if the table is never brought online. I believe that Keith said that
> the table could still be scanned when offline with existing MapReduce code
> or the OfflineScanner, which presents an issue that is not currently
> handled. I think we discussed today that the same thing could be achieved
> with tables in the on demand state. The reason to not modify an offline
> table is the export case, where the table needs to be immutable until the
> files are copied.
>
> On Thu, Mar 23, 2023, 6:58 PM Christopher <ctubb...@apache.org> wrote:
>
>> What do you mean by "when not used in this manner"? What other way is
>> there to use that feature? Do you mean simply never being brought
>> online?
>>
>> Would it be possible to support (external) compactions for an offline
>> table?
>>
>> I feel like that's a pretty useful feature to revert, and would want
>> to consider alternatives.
>>
>> On Thu, Mar 23, 2023 at 6:39 PM Dave Marion <dmario...@gmail.com> wrote:
>> >
>> > Keith and I had a discussion today (that included some user input)
>> > regarding table operations with the new OnDemand table concept. I have
>> put
>> > the notes up on the wiki at:
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=247828052
>> .
>> > One thing that came out of that is that we may want to revert the
>> change in
>> > the new bulk import code that allows a user to import into an offline
>> > table. The feature allows a user to create a table that is initially
>> > offline, bulk import data into it, then bring it online. However, when
>> not
>> > used in this manner the number of bulk import files would continue to
>> grow
>> > because compactions are never run on the table.
>> >
>> > On Mon, Mar 20, 2023 at 9:37 AM Dave Marion <dmario...@gmail.com>
>> wrote:
>> >
>> > > Following up on this. Discussion and design documents are up on the
>> > > wiki[1]. There is a GitHub project[2] for planning out some of the
>> tasks,
>> > > which are then turned into issues. Some of the issues have draft PRs
>> > > submitted for them.
>> > >
>> > > [1] https://cwiki.apache.org/confluence/display/ACCUMULO/Elasticity
>> > > [2] https://github.com/orgs/apache/projects/164
>> > >
>> > > On Wed, Feb 22, 2023 at 2:35 PM Dave Marion <dmario...@gmail.com>
>> wrote:
>> > >
>> > >> Except for the new bulk import code, Accumulo requires that tables
>> are in
>> > >> an online state to work with them (ingest, scan, compact, split,
>> etc.). In
>> > >> some cases this could become cost prohibitive and resource
>> inefficient as
>> > >> resources necessary to keep the tables online might be unused. I'd
>> like to
>> > >> propose a new capability for Accumulo - the ability to work with
>> tables
>> > >> that are not online. This could either mean working with tables in an
>> > >> offline state, or maybe the ability to assign/host tables/tablets on
>> > >> demand.
>> > >>
>> > >> At a high level the two ideas currently being discussed are below. I
>> > >> think in both cases the root and metadata tables must be online,
>> table
>> > >> management functions move to manager components, and compactions of
>> offline
>> > >> tables move to the external compaction processes. In addition, new
>> metrics
>> > >> would need to be emitted so that an external resource scheduler
>> could spin
>> > >> up/down server processes as demand changes.
>> > >>
>> > >>
>> > >> *Offline Operations*
>> > >>
>> > >> This approach allows all operations to occur on offline tables at the
>> > >> cost of having eventual consistency to the data at scan time (via
>> Scan
>> > >> Servers only). Live ingest could be supported through the creation
>> of an
>> > >> ingest server component that just receives mutations and minor
>> compacts.
>> > >>
>> > >>
>> > >>
>> > >> *On-demand Tables*
>> > >> This approach allows for user tables to be offline and un-hosted, but
>> > >> hosts them on demand for the purpose of live ingest and immediate
>> scans at
>> > >> the latency cost of possibly assigning and hosting the tablet.
>> > >>
>> > >> We have a few releases (1.10.3, 2.1.1, and 3.0.0) coming up in
>> likely the
>> > >> next month or two, but after that I'd like to start implementing
>> something
>> > >> to address this. Please contribute to the discussion if you have
>> thoughts
>> > >> on requirements, design, etc.
>> > >>
>> > >>
>> > >>
>> > >>
>>
>

Re: Dynamic Scaling of Accumulo

Reply via email to