After some additional consideration, and getting a better understanding of how the code is expected to work from discussing it with Dave... I'm a little more inclined to support #2422 in 2.1, provided:
1. There's time for me to review it, 2. It is sufficiently decoupled from the existing code and marked experimental, so that we have the flexibility to alter its design, if it seems appropriate after it gets some exposure after the release, 3. Unit tests and integration tests are reliably passing (as stable as, or more stable than, they are currently), 4. No serious issues are discovered during review, and 5. It doesn't delay a release past early June, as I think this is a reasonable target date. This my wishlist before I can get behind it with a +1 for 2.1. If these aren't met, I do not intend to veto, but I'd be a -0 on its inclusion to 2.1. Of course, once I review it, my thoughts may change a bit. On Mon, Apr 4, 2022 at 7:07 PM Mike Miller <mmil...@apache.org> wrote: > I think I can finish the FATE refactor PR [1] for 2.1. I had been keeping > it up to date with the latest in main but stopped because it was too much > work. I was waiting until the ZK property changes are completed before > resolving the latest conflicts. I don't think it is much of a risk. It is > mostly cleanup and refactoring to remove generics from the serialization > code. It will be some work to revisit but I think the risk is pretty low. > It would allow changing the serialization, which we may be able to get into > 2.1 as well. > > [1] https://github.com/apache/accumulo/pull/2475 > > On Mon, Apr 4, 2022 at 11:50 AM Keith Turner <ke...@deenlo.com> wrote: > > > On Mon, Apr 4, 2022 at 11:17 AM Christopher <ctubb...@apache.org> wrote: > > > > > > I haven't seen the metrics test fail very often lately. If it's > stable, I > > > don't mind removing the blocker on that issue, but I'd be reluctant to > > > close it entirely just yet, until we can verify it doesn't happen > > anymore. > > > > > > As for the original list of potential issues to include, I'm in favor > of > > > trying to get #2197 in. It was started awhile ago, is relatively simple > > and > > > well understood by several of us already... it just needs a bit of > > > attention to finalize reviews so it can be merged. > > > > > > However, I'm reluctant to include #2422, because I don't think it's > near > > > ready enough, and by the time it is, it will be very last minute, and I > > > don't want to delay 2.1 further for it. Even if it's included as an > > > experimental feature, I think it has huge potential to be disruptive, > or > > to > > > have a lot of churn by the time people actually have a chance to review > > it > > > thoroughly. Furthermore, I think there are possible alternatives (like > a > > > fully client-side implementation, based on offline scanners) that would > > > avoid the tight coupling of a new service to Accumulo's core code. This > > > > There are some advantages to scan servers over direct file access to > > consider. One is scalability of computation, if a web server is > > serving N client queries with scan servers those can potentially go to > > different scan servers. With direct file access, all N queries and > > their iterator stacks would have to run in the web server. Another is > > scalability of caching/memory. When web servers send queries to scan > > servers using a sticky algorithm for assigning tablets to groups of > > scan servers, it could lead to good cache utilization and sharing that > > may not be possible when running scans directly in the web server. So > > scan servers allow scaling cache and computations for queries > > independently of web servers in way that may not be possible with > > direct file access. > > > > Another advantage to consider is isolation. With direct file access > > and queries running directly in a web server, a bad query could bring > > down a web server and lots of unrelated queries. Having a bad query > > bring down a scan server may be less disruptive. > > > > > thread isn't for discussing this in depth, so we can have that > discussion > > > in a separate thread, but I'm generally opposed to including it this > late > > > in 2.1's development, given the timing, size and scope, tight coupling, > > and > > > current state. > > > > > > I don't know enough about #2475 to have a strong opinion, but it looks > > big, > > > and possibly high-risk, given the critical code it touches. It > currently > > > has a substantial number of conflicts with the main branch. However, I > > was > > > thinking that *some* minimal refactoring (like low-risk automatic > > > refactoring, like moving packages) could be done. So, if that's all > this > > > does, it might be okay. Otherwise, maybe it can be simplified? At the > > very > > > least, I was thinking it would be a good opportunity to move the > > > `org.apache.accumulo.fate` packages into an appropriate > > > `org.apache.accumulo.core` parent package (some would go to > > o.a.a.core.fate > > > and others might go to o.a.a.core.util or similar) to keep the package > > > namespaces standardized, which is helpful to avoid naming collisions > and > > > jar sealing issues, as well as for less complicated jigsaw module > > > definitions in future. Since 2.1 FaTE is already incompatible with > prior > > > versions, a rename at this time would be less disruptive. > > > > > > Another task I had wanted to be done for 2.1, before I got distracted > > > fixing test failures during and after Christmas and trying to work > > through > > > the singleton manager zookeeper stuff to see what we could simplify. > > What I > > > had wanted done was to standardize the way we pass table identifiers > > (name, > > > IDs) across the RPC layer, since we currently do that inconsistently. I > > > don't remember if there's an existing ticket open for it, but I have a > > > working branch I had started working out of for it before Christmas. > It's > > > relatively simple work, and would set us up for some much better APIs > > going > > > forward, as well as help with logging information about table actions. > If > > > necessary, it could be bumped to a future version, but then we'd have > > more > > > churn in the thrift layer. So, I'd prefer to get it for 2.1 to avoid > > that. > > > > > > As for planning, I was thinking early May for a code freeze (except bug > > > fixes and small improvements found during testing), so we can try to > > > release towards the end of May/early June. If we go with that timeline, > > > that's not a lot of time to wrap up features and have time for > > > review/testing, so we may need to be selective about what we hold off > > until > > > the next version, unless we want to further delay 2.1. > > > > > > > > > On Mon, Apr 4, 2022 at 9:13 AM Dave Marion <dmario...@gmail.com> > wrote: > > > > > > > I think [3] is OBE and can be closed. > > > > > > > > On Mon, Apr 4, 2022 at 9:11 AM Mike Miller <mmil...@apache.org> > wrote: > > > > > > > > > Yes I agree, that was the goal of this email thread. I found a few > > more > > > > > tickets that should be addressed for the next release. > > > > > > > > > > Ivan - There was some work done on this PR but it has been some > > time. Do > > > > > you want to take a look at it? Implement a Thread limit. [1] > > > > > Keith T - I think we should get this one merged to fix that > > consistency > > > > > check bug I found. It looks like it is finished. [2] > > > > > Dave & Dom - Were you guys able to figure out a fix for the new > > external > > > > > compaction metrics test? [3] > > > > > > > > > > FYI we have 6 blockers for 2.1: > > > > > https://github.com/apache/accumulo/labels/blocker > > > > > > > > > > This is almost definitely going into 2.1 [4]. Thanks Jeff! > > > > > > > > > > [1] https://github.com/apache/accumulo/pull/1487 > > > > > [2] https://github.com/apache/accumulo/pull/2574 > > > > > [3] https://github.com/apache/accumulo/issues/2406 > > > > > [4] https://github.com/apache/accumulo/pull/2215 > > > > > > > > > > On Fri, Apr 1, 2022 at 2:21 PM Dave Marion <dmario...@gmail.com> > > wrote: > > > > > > > > > > > I think it would be useful to do some release planning so that we > > know > > > > > what > > > > > > features we are working towards and in which release they will be > > in. > > > > > This > > > > > > would be helpful for determining what existing PRs need to make > it > > into > > > > > > 2.1.0. 2.1.0 is the LTM release, so patches for existing features > > will > > > > be > > > > > > backported (2.1.1, 2.1.2, 2.1.3, etc.) However, as defined in > [1], > > > > > features > > > > > > that don't make it into 2.1.0 will go into the next non-LTM > release > > > > > (2.2.0) > > > > > > and any patches to bugs in those features will go into the next > > non-LTM > > > > > > release after that (2.3.0). > > > > > > > > > > > > I'm not trying to hold up the 2.1.0 release by suggesting that we > > > > perform > > > > > > this activity. I'm just asking what the future holds, even if > it's > > just > > > > > one > > > > > > feature in the next non-LTM release. My concern is that the next > > > > release > > > > > > will be open-ended and anything not included in 2.1.0 might not > > get put > > > > > > into a release for a very long time. > > > > > > > > > > > > [1] https://accumulo.apache.org/contributor/versioning.html#LTM > > > > > > > > > > > > > > > > > > On Thu, Mar 31, 2022 at 11:43 AM Mike Miller <mmil...@apache.org > > > > > > wrote: > > > > > > > > > > > > > Starting an email chain of things that folks want to finish for > > 2.1. > > > > > Here > > > > > > > is what we currently have in the works that are most likely > going > > > > into > > > > > > 2.1: > > > > > > > https://github.com/apache/accumulo/pull/2569 > > > > > > > https://github.com/apache/accumulo/pull/2600 > > > > > > > https://github.com/apache/accumulo/pull/2293 > > > > > > > > > > > > > > Some things that may go into 2.1: > > > > > > > https://github.com/apache/accumulo/pull/2422 > > > > > > > https://github.com/apache/accumulo/pull/2475 > > > > > > > https://github.com/apache/accumulo/pull/2197 > > > > > > > > > > > > > > I created a Project for follow on work to the ZK property > > change. I > > > > was > > > > > > > planning on putting tasks in there that we want to complete for > > 2.1. > > > > > But > > > > > > we > > > > > > > could also use it for post 2.1 work. > > > > > > > https://github.com/apache/accumulo/projects/24 > > > > > > > https://github.com/apache/accumulo/issues/2469 > > > > > > > > > > > > > > FYI a draft copy of the release notes has already been on the > > > > website: > > > > > > > https://accumulo.apache.org/release/accumulo-2.1.0/ > > > > > > > > > > > > > > This may be a good thread to discuss whether or not a task > needs > > to > > > > go > > > > > > into > > > > > > > 2.1 or should wait for the next version. We currently have 32 > > open > > > > pull > > > > > > > requests so please email me if there is one that you would like > > > > > > prioritized > > > > > > > for 2.1. > > > > > > > > > > > > > > > > > > > > > > > > >