This is another project I am interested in as well as my group at Salesforce. We have had some discussions internally on this but I wasn't aware of this specific Spark issue (We only allow phoenix access via spark by default). I think the approaches outlined are a good initial step but we were also considering a larger breakup of phoenix-core. I don't think the desire for the larger step should stop us from doing the initial ones Istavan and Josh proposed. I think the high level plan makes sense but I might prefer a different name than phoenix-tools for the ones we want to be available to external libraries like phoenix-connectors. Another possible alternative is to restructure maybe less invasively by making phoenix core like your proposed tools and making a phoenix-internal or similar for the future. One thing I was wondering was how much effort it was to split client/server through phoenix-core... Lars layed out a good component view of phoenix whosethe first step might be PHOENIx-5483 but we could focus on highest level separation rather than bottom up. However, even that thread linked there talks about a client-facing api which we can piggyback for this use. Say phoeinx-public-api or similar.
On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <[email protected]> wrote: > Hi Josh & Istvan > > Thanks Istvan for looking into this, I am also interested in solving this > problem, > Let me know how I can help? > > Thanks > Jacob > > On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <[email protected]> wrote: > > > Thanks for trying to tackle this sticky problem, Istvan. For the context > > of everyone else, the real-life problem Istvan is trying to fix is that > > you cannot run a Spark application with both HBase and Phoenix jars on > > the classpath. > > > > If I understand this correctly, it's that the HBase API signatures are > > different depending on whether we are "client side" or "server side" > > (within a RegionServer). Your comment on PHOENIX-6053 shows that > > (signatures on Table.java around Protobuf's Service class having shaded > > relocation vs. the original com.google.protobuf coordinates). > > > > I think the reason we have the monolithic phoenix-core is that we have > > so much logic which is executed on both the client and server side. For > > example, we may push a filter operation to the server-side or we many > > run it client-side. That's also why we have the "thin" phoenix-server > > Maven module which just re-packages phoenix-core. > > > > Is it possible that we change phoenix-server so that it contains the > > "server-side" code that we don't want to have using the HBase classes > > with thirdparty relocations, rather than introduce another new Maven > > module? > > > > Looking through your WIP PR too. > > > > On 4/7/21 1:10 AM, Istvan Toth wrote: > > > Hi! > > > > > > I've been working on getting Phoenix working with > > hbase-shaded-client.jar, > > > and I am finally getting traction. > > > > > > One of the issues that I encountered is that we are mixing client and > > > server side code in phoenix-core, and there's a > > > mutual interdependence between the two. > > > > > > Fixing this is not hard, as it's mostly about replacing > .class.getName() > > s > > > with string constants, and moving around some inconveniently placed > > static > > > utility methods, and now I have a WIP version where the client side > > doesn't > > > depend on server classes. > > > > > > However, unless we change the project structure, and factor out the > > classes > > > that depend on server-side APIs, this will be extremely fragile, as any > > > change can (and will) re-introduce the circular dependency between the > > > classes. > > > > > > To solve this issue I propose the following: > > > > > > - clean up phoenix-core, so that only classes that depend only on > > > *hbase-client* (or at worst only on classes that are present in > > > *hbase-shaded-client*) remain. This should be 90+% of the code > > > - move all classes (mostly coprocessors and their support code) > that > > use > > > the server API (*hbase-server* mostly) to a new module, say > > > phoenix-coprocessors (the phoenix-server module name is taken). > This > > new > > > class depends on phoenix-core. > > > - move all classes that directly depend on MapReduce, and their > > main() > > > classes to the existing phoenix-tools module (which also depends on > > core) > > > > > > The separation would be primarily based on API use, at the first cut > I'd > > be > > > fine with keeping all logic phoenix-core, and referencing that. We may > or > > > may not want to move logic that is only used in coprocessors or tools, > > but > > > doesn't use the respective APIs to the new modules later. > > > > > > As for the main artifacts: > > > > > > - *phoenix-server.jar* would include code from all three classes. > > > - A newly added *phoenix-client-byo-shaded-hbase.jar *would include > > only > > > the code from cleaned-up phoenix-core > > > - Ideally, we'd remove the the tools and coprocessor code (and > > > dependencies) from the standard and embedded clients, and switch > > > documentation to use *phoenix-server* to run the MR tools, but this > > is > > > optional. > > > > > > I am tracking this work in PHOENIX-6053, which has a (currently > working) > > > WIP patch attached. > > > > > > I think that this change would fit the pattern established by creating > > the > > > phoenix-tools module, > > > but as this is major change in project structure (even if the actual > Java > > > changes are trivial), > > > I'd like to gather your input on this approach (please also speak up if > > you > > > agree). > > > > > > regards > > > Istvan > > > > > >
