Jacob, Josh and me had a discussion about the topic. I'm attaching the dependency graph of the proposed modules
On Fri, Apr 9, 2021 at 6:30 AM Istvan Toth <[email protected]> wrote: > The bulk of the changes I'm working on is indeed the separation of the > client and the server side code. > > Separating the MR related classes, and the tools-specific code (main, > options parsing, etc) makes sense to me, if we don't mind adding another > module. > > In the first WIP iteration, I'm splitting out everything that depends on > more than hbase-client into a "server" module. > Once that works I will look at splitting that further into a real > "server" and an "MR/tools" module. > > My initial estimates about splitting the server side code were way too > optimistic, we have to touch a lot of code to break circular dependencies > between the client and server side. The changes are still quite trivial, > but the patch is going to be huge and scary. > > Tests are also going to be a problem, we're probably going to have to move > most of them into the "server" or a separate "tests" module, as the > MiniCluster tests depend on code from each module. > > The plan in PHOENIX-5483, and Lars's mail sounds good, but I think that it > would be more about dividing the "client-side" module further. > (BTW I think that making the indexing engine available separately would > also be a popular feature ) > > > > On Fri, Apr 9, 2021 at 5:39 AM Daniel Wong <[email protected]> wrote: > >> This is another project I am interested in as well as my group at >> Salesforce. We have had some discussions internally on this but I wasn't >> aware of this specific Spark issue (We only allow phoenix access via spark >> by default). I think the approaches outlined are a good initial step but >> we were also considering a larger breakup of phoenix-core. I don't >> think the desire for the larger step should stop us from doing the initial >> ones Istavan and Josh proposed. I think the high level plan makes sense >> but I might prefer a different name than phoenix-tools for the ones we >> want >> to be available to external libraries like phoenix-connectors. Another >> possible alternative is to restructure maybe less invasively by making >> phoenix core like your proposed tools and making a phoenix-internal or >> similar for the future. >> One thing I was wondering was how much effort it was to split >> client/server >> through phoenix-core... Lars layed out a good component view of phoenix >> whosethe first step might be PHOENIx-5483 but we could focus on highest >> level separation rather than bottom up. However, even that thread linked >> there talks about a client-facing api which we can piggyback for this use. >> Say phoeinx-public-api or similar. >> >> On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <[email protected]> >> wrote: >> >> > Hi Josh & Istvan >> > >> > Thanks Istvan for looking into this, I am also interested in solving >> this >> > problem, >> > Let me know how I can help? >> > >> > Thanks >> > Jacob >> > >> > On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <[email protected]> wrote: >> > >> > > Thanks for trying to tackle this sticky problem, Istvan. For the >> context >> > > of everyone else, the real-life problem Istvan is trying to fix is >> that >> > > you cannot run a Spark application with both HBase and Phoenix jars on >> > > the classpath. >> > > >> > > If I understand this correctly, it's that the HBase API signatures are >> > > different depending on whether we are "client side" or "server side" >> > > (within a RegionServer). Your comment on PHOENIX-6053 shows that >> > > (signatures on Table.java around Protobuf's Service class having >> shaded >> > > relocation vs. the original com.google.protobuf coordinates). >> > > >> > > I think the reason we have the monolithic phoenix-core is that we have >> > > so much logic which is executed on both the client and server side. >> For >> > > example, we may push a filter operation to the server-side or we many >> > > run it client-side. That's also why we have the "thin" phoenix-server >> > > Maven module which just re-packages phoenix-core. >> > > >> > > Is it possible that we change phoenix-server so that it contains the >> > > "server-side" code that we don't want to have using the HBase classes >> > > with thirdparty relocations, rather than introduce another new Maven >> > > module? >> > > >> > > Looking through your WIP PR too. >> > > >> > > On 4/7/21 1:10 AM, Istvan Toth wrote: >> > > > Hi! >> > > > >> > > > I've been working on getting Phoenix working with >> > > hbase-shaded-client.jar, >> > > > and I am finally getting traction. >> > > > >> > > > One of the issues that I encountered is that we are mixing client >> and >> > > > server side code in phoenix-core, and there's a >> > > > mutual interdependence between the two. >> > > > >> > > > Fixing this is not hard, as it's mostly about replacing >> > .class.getName() >> > > s >> > > > with string constants, and moving around some inconveniently placed >> > > static >> > > > utility methods, and now I have a WIP version where the client side >> > > doesn't >> > > > depend on server classes. >> > > > >> > > > However, unless we change the project structure, and factor out the >> > > classes >> > > > that depend on server-side APIs, this will be extremely fragile, as >> any >> > > > change can (and will) re-introduce the circular dependency between >> the >> > > > classes. >> > > > >> > > > To solve this issue I propose the following: >> > > > >> > > > - clean up phoenix-core, so that only classes that depend only >> on >> > > > *hbase-client* (or at worst only on classes that are present in >> > > > *hbase-shaded-client*) remain. This should be 90+% of the code >> > > > - move all classes (mostly coprocessors and their support code) >> > that >> > > use >> > > > the server API (*hbase-server* mostly) to a new module, say >> > > > phoenix-coprocessors (the phoenix-server module name is taken). >> > This >> > > new >> > > > class depends on phoenix-core. >> > > > - move all classes that directly depend on MapReduce, and their >> > > main() >> > > > classes to the existing phoenix-tools module (which also >> depends on >> > > core) >> > > > >> > > > The separation would be primarily based on API use, at the first cut >> > I'd >> > > be >> > > > fine with keeping all logic phoenix-core, and referencing that. We >> may >> > or >> > > > may not want to move logic that is only used in coprocessors or >> tools, >> > > but >> > > > doesn't use the respective APIs to the new modules later. >> > > > >> > > > As for the main artifacts: >> > > > >> > > > - *phoenix-server.jar* would include code from all three >> classes. >> > > > - A newly added *phoenix-client-byo-shaded-hbase.jar *would >> include >> > > only >> > > > the code from cleaned-up phoenix-core >> > > > - Ideally, we'd remove the the tools and coprocessor code (and >> > > > dependencies) from the standard and embedded clients, and switch >> > > > documentation to use *phoenix-server* to run the MR tools, but >> this >> > > is >> > > > optional. >> > > > >> > > > I am tracking this work in PHOENIX-6053, which has a (currently >> > working) >> > > > WIP patch attached. >> > > > >> > > > I think that this change would fit the pattern established by >> creating >> > > the >> > > > phoenix-tools module, >> > > > but as this is major change in project structure (even if the actual >> > Java >> > > > changes are trivial), >> > > > I'd like to gather your input on this approach (please also speak >> up if >> > > you >> > > > agree). >> > > > >> > > > regards >> > > > Istvan >> > > > >> > > >> > >> > > > -- > *István Tóth* | Staff Software Engineer > [email protected] <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > <https://www.cloudera.com/> > ------------------------------ >
