The bulk of the changes I'm working on is indeed the separation of the client and the server side code.
Separating the MR related classes, and the tools-specific code (main, options parsing, etc) makes sense to me, if we don't mind adding another module. In the first WIP iteration, I'm splitting out everything that depends on more than hbase-client into a "server" module. Once that works I will look at splitting that further into a real "server" and an "MR/tools" module. My initial estimates about splitting the server side code were way too optimistic, we have to touch a lot of code to break circular dependencies between the client and server side. The changes are still quite trivial, but the patch is going to be huge and scary. Tests are also going to be a problem, we're probably going to have to move most of them into the "server" or a separate "tests" module, as the MiniCluster tests depend on code from each module. The plan in PHOENIX-5483, and Lars's mail sounds good, but I think that it would be more about dividing the "client-side" module further. (BTW I think that making the indexing engine available separately would also be a popular feature ) On Fri, Apr 9, 2021 at 5:39 AM Daniel Wong <[email protected]> wrote: > This is another project I am interested in as well as my group at > Salesforce. We have had some discussions internally on this but I wasn't > aware of this specific Spark issue (We only allow phoenix access via spark > by default). I think the approaches outlined are a good initial step but > we were also considering a larger breakup of phoenix-core. I don't > think the desire for the larger step should stop us from doing the initial > ones Istavan and Josh proposed. I think the high level plan makes sense > but I might prefer a different name than phoenix-tools for the ones we want > to be available to external libraries like phoenix-connectors. Another > possible alternative is to restructure maybe less invasively by making > phoenix core like your proposed tools and making a phoenix-internal or > similar for the future. > One thing I was wondering was how much effort it was to split client/server > through phoenix-core... Lars layed out a good component view of phoenix > whosethe first step might be PHOENIx-5483 but we could focus on highest > level separation rather than bottom up. However, even that thread linked > there talks about a client-facing api which we can piggyback for this use. > Say phoeinx-public-api or similar. > > On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <[email protected]> > wrote: > > > Hi Josh & Istvan > > > > Thanks Istvan for looking into this, I am also interested in solving this > > problem, > > Let me know how I can help? > > > > Thanks > > Jacob > > > > On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <[email protected]> wrote: > > > > > Thanks for trying to tackle this sticky problem, Istvan. For the > context > > > of everyone else, the real-life problem Istvan is trying to fix is that > > > you cannot run a Spark application with both HBase and Phoenix jars on > > > the classpath. > > > > > > If I understand this correctly, it's that the HBase API signatures are > > > different depending on whether we are "client side" or "server side" > > > (within a RegionServer). Your comment on PHOENIX-6053 shows that > > > (signatures on Table.java around Protobuf's Service class having shaded > > > relocation vs. the original com.google.protobuf coordinates). > > > > > > I think the reason we have the monolithic phoenix-core is that we have > > > so much logic which is executed on both the client and server side. For > > > example, we may push a filter operation to the server-side or we many > > > run it client-side. That's also why we have the "thin" phoenix-server > > > Maven module which just re-packages phoenix-core. > > > > > > Is it possible that we change phoenix-server so that it contains the > > > "server-side" code that we don't want to have using the HBase classes > > > with thirdparty relocations, rather than introduce another new Maven > > > module? > > > > > > Looking through your WIP PR too. > > > > > > On 4/7/21 1:10 AM, Istvan Toth wrote: > > > > Hi! > > > > > > > > I've been working on getting Phoenix working with > > > hbase-shaded-client.jar, > > > > and I am finally getting traction. > > > > > > > > One of the issues that I encountered is that we are mixing client and > > > > server side code in phoenix-core, and there's a > > > > mutual interdependence between the two. > > > > > > > > Fixing this is not hard, as it's mostly about replacing > > .class.getName() > > > s > > > > with string constants, and moving around some inconveniently placed > > > static > > > > utility methods, and now I have a WIP version where the client side > > > doesn't > > > > depend on server classes. > > > > > > > > However, unless we change the project structure, and factor out the > > > classes > > > > that depend on server-side APIs, this will be extremely fragile, as > any > > > > change can (and will) re-introduce the circular dependency between > the > > > > classes. > > > > > > > > To solve this issue I propose the following: > > > > > > > > - clean up phoenix-core, so that only classes that depend only on > > > > *hbase-client* (or at worst only on classes that are present in > > > > *hbase-shaded-client*) remain. This should be 90+% of the code > > > > - move all classes (mostly coprocessors and their support code) > > that > > > use > > > > the server API (*hbase-server* mostly) to a new module, say > > > > phoenix-coprocessors (the phoenix-server module name is taken). > > This > > > new > > > > class depends on phoenix-core. > > > > - move all classes that directly depend on MapReduce, and their > > > main() > > > > classes to the existing phoenix-tools module (which also depends > on > > > core) > > > > > > > > The separation would be primarily based on API use, at the first cut > > I'd > > > be > > > > fine with keeping all logic phoenix-core, and referencing that. We > may > > or > > > > may not want to move logic that is only used in coprocessors or > tools, > > > but > > > > doesn't use the respective APIs to the new modules later. > > > > > > > > As for the main artifacts: > > > > > > > > - *phoenix-server.jar* would include code from all three classes. > > > > - A newly added *phoenix-client-byo-shaded-hbase.jar *would > include > > > only > > > > the code from cleaned-up phoenix-core > > > > - Ideally, we'd remove the the tools and coprocessor code (and > > > > dependencies) from the standard and embedded clients, and switch > > > > documentation to use *phoenix-server* to run the MR tools, but > this > > > is > > > > optional. > > > > > > > > I am tracking this work in PHOENIX-6053, which has a (currently > > working) > > > > WIP patch attached. > > > > > > > > I think that this change would fit the pattern established by > creating > > > the > > > > phoenix-tools module, > > > > but as this is major change in project structure (even if the actual > > Java > > > > changes are trivial), > > > > I'd like to gather your input on this approach (please also speak up > if > > > you > > > > agree). > > > > > > > > regards > > > > Istvan > > > > > > > > > > -- *István Tóth* | Staff Software Engineer [email protected] <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> <https://www.cloudera.com/> ------------------------------
