This is another project I am interested in as well as my group at
Salesforce.  We have had some discussions internally on this but I wasn't
aware of this specific Spark issue (We only allow phoenix access via spark
by default).  I think the approaches outlined are a good initial step but
we were also considering a larger breakup of phoenix-core.  I don't
think the desire for the larger step should stop us from doing the initial
ones Istavan and Josh proposed.  I think the high level plan makes sense
but I might prefer a different name than phoenix-tools for the ones we want
to be available to external libraries like phoenix-connectors.  Another
possible alternative is to restructure maybe less invasively by making
phoenix core like your proposed tools and making a phoenix-internal or
similar for the future.
One thing I was wondering was how much effort it was to split client/server
through phoenix-core...  Lars layed out a good component view of phoenix
whosethe first step might be PHOENIx-5483 but we could focus on highest
level separation rather than bottom up.  However, even that thread linked
there talks about a client-facing api which we can piggyback for this use.
Say phoeinx-public-api or similar.

On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <[email protected]> wrote:

> Hi Josh & Istvan
>
> Thanks Istvan for looking into this, I am also interested in solving this
> problem,
> Let me know how I can help?
>
> Thanks
> Jacob
>
> On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <[email protected]> wrote:
>
> > Thanks for trying to tackle this sticky problem, Istvan. For the context
> > of everyone else, the real-life problem Istvan is trying to fix is that
> > you cannot run a Spark application with both HBase and Phoenix jars on
> > the classpath.
> >
> > If I understand this correctly, it's that the HBase API signatures are
> > different depending on whether we are "client side" or "server side"
> > (within a RegionServer). Your comment on PHOENIX-6053 shows that
> > (signatures on Table.java around Protobuf's Service class having shaded
> > relocation vs. the original com.google.protobuf coordinates).
> >
> > I think the reason we have the monolithic phoenix-core is that we have
> > so much logic which is executed on both the client and server side. For
> > example, we may push a filter operation to the server-side or we many
> > run it client-side. That's also why we have the "thin" phoenix-server
> > Maven module which just re-packages phoenix-core.
> >
> > Is it possible that we change phoenix-server so that it contains the
> > "server-side" code that we don't want to have using the HBase classes
> > with thirdparty relocations, rather than introduce another new Maven
> > module?
> >
> > Looking through your WIP PR too.
> >
> > On 4/7/21 1:10 AM, Istvan Toth wrote:
> > > Hi!
> > >
> > > I've been working on getting Phoenix working with
> > hbase-shaded-client.jar,
> > > and I am finally getting traction.
> > >
> > > One of the issues that I encountered is that we are mixing client and
> > > server side code in phoenix-core, and there's a
> > > mutual interdependence between the two.
> > >
> > > Fixing this is not hard, as it's mostly about replacing
> .class.getName()
> > s
> > > with string constants, and moving around some inconveniently placed
> > static
> > > utility methods, and now I have a WIP version where the client side
> > doesn't
> > > depend on server classes.
> > >
> > > However, unless we change the project structure, and factor out the
> > classes
> > > that depend on server-side APIs, this will be extremely fragile, as any
> > > change can (and will) re-introduce the circular dependency between the
> > > classes.
> > >
> > > To solve this issue I propose the following:
> > >
> > >     - clean up phoenix-core, so that only classes that depend only on
> > >     *hbase-client* (or at worst only on classes that are present in
> > >     *hbase-shaded-client*) remain. This should be 90+% of the code
> > >     - move all classes (mostly coprocessors and their support code)
> that
> > use
> > >     the server API (*hbase-server* mostly) to a new module, say
> > >     phoenix-coprocessors (the phoenix-server module name is taken).
> This
> > new
> > >     class depends on phoenix-core.
> > >     - move all classes that directly depend on MapReduce, and their
> > main()
> > >     classes to the existing phoenix-tools module (which also depends on
> > core)
> > >
> > > The separation would be primarily based on API use, at the first cut
> I'd
> > be
> > > fine with keeping all logic phoenix-core, and referencing that. We may
> or
> > > may not want to move logic that is only used in coprocessors or tools,
> > but
> > > doesn't use the respective APIs to the new modules later.
> > >
> > > As for the main artifacts:
> > >
> > >     - *phoenix-server.jar* would include code from all three classes.
> > >     - A newly added *phoenix-client-byo-shaded-hbase.jar *would include
> > only
> > >     the code from cleaned-up phoenix-core
> > >     - Ideally, we'd remove the the tools and coprocessor code (and
> > >     dependencies) from the standard and embedded clients, and switch
> > >     documentation to use *phoenix-server* to run the MR tools, but this
> > is
> > >     optional.
> > >
> > > I am tracking this work in PHOENIX-6053, which has a (currently
> working)
> > > WIP patch attached.
> > >
> > > I think that this change would fit the pattern established by creating
> > the
> > > phoenix-tools module,
> > > but as this is major change in project structure (even if the actual
> Java
> > > changes are trivial),
> > > I'd like to gather your input on this approach (please also speak up if
> > you
> > > agree).
> > >
> > > regards
> > > Istvan
> > >
> >
>

Reply via email to