Jacob, Josh and me had a discussion about the topic.

I'm attaching the dependency graph of the proposed modules



On Fri, Apr 9, 2021 at 6:30 AM Istvan Toth <[email protected]> wrote:

> The bulk of the changes I'm working on is indeed the separation of the
> client and the server side code.
>
> Separating the MR related classes, and the tools-specific code (main,
> options parsing, etc) makes sense to me, if we don't mind adding another
> module.
>
> In the first WIP iteration, I'm splitting out everything that depends on
> more than hbase-client into a "server" module.
> Once that works I will look at splitting that further into a  real
> "server" and an "MR/tools" module.
>
> My initial estimates about splitting the server side code were way too
> optimistic, we have to touch a lot of code to break circular dependencies
> between the client and server side. The changes are still quite trivial,
> but the patch is going to be huge and scary.
>
> Tests are also going to be a problem, we're probably going to have to move
> most of them into the "server" or a separate "tests" module, as the
> MiniCluster tests depend on code from each module.
>
> The plan in PHOENIX-5483, and Lars's mail sounds good, but I think that it
> would be more about dividing the "client-side" module further.
> (BTW I think that making the indexing engine available separately would
> also be a popular feature )
>
>
>
> On Fri, Apr 9, 2021 at 5:39 AM Daniel Wong <[email protected]> wrote:
>
>> This is another project I am interested in as well as my group at
>> Salesforce.  We have had some discussions internally on this but I wasn't
>> aware of this specific Spark issue (We only allow phoenix access via spark
>> by default).  I think the approaches outlined are a good initial step but
>> we were also considering a larger breakup of phoenix-core.  I don't
>> think the desire for the larger step should stop us from doing the initial
>> ones Istavan and Josh proposed.  I think the high level plan makes sense
>> but I might prefer a different name than phoenix-tools for the ones we
>> want
>> to be available to external libraries like phoenix-connectors.  Another
>> possible alternative is to restructure maybe less invasively by making
>> phoenix core like your proposed tools and making a phoenix-internal or
>> similar for the future.
>> One thing I was wondering was how much effort it was to split
>> client/server
>> through phoenix-core...  Lars layed out a good component view of phoenix
>> whosethe first step might be PHOENIx-5483 but we could focus on highest
>> level separation rather than bottom up.  However, even that thread linked
>> there talks about a client-facing api which we can piggyback for this use.
>> Say phoeinx-public-api or similar.
>>
>> On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <[email protected]>
>> wrote:
>>
>> > Hi Josh & Istvan
>> >
>> > Thanks Istvan for looking into this, I am also interested in solving
>> this
>> > problem,
>> > Let me know how I can help?
>> >
>> > Thanks
>> > Jacob
>> >
>> > On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <[email protected]> wrote:
>> >
>> > > Thanks for trying to tackle this sticky problem, Istvan. For the
>> context
>> > > of everyone else, the real-life problem Istvan is trying to fix is
>> that
>> > > you cannot run a Spark application with both HBase and Phoenix jars on
>> > > the classpath.
>> > >
>> > > If I understand this correctly, it's that the HBase API signatures are
>> > > different depending on whether we are "client side" or "server side"
>> > > (within a RegionServer). Your comment on PHOENIX-6053 shows that
>> > > (signatures on Table.java around Protobuf's Service class having
>> shaded
>> > > relocation vs. the original com.google.protobuf coordinates).
>> > >
>> > > I think the reason we have the monolithic phoenix-core is that we have
>> > > so much logic which is executed on both the client and server side.
>> For
>> > > example, we may push a filter operation to the server-side or we many
>> > > run it client-side. That's also why we have the "thin" phoenix-server
>> > > Maven module which just re-packages phoenix-core.
>> > >
>> > > Is it possible that we change phoenix-server so that it contains the
>> > > "server-side" code that we don't want to have using the HBase classes
>> > > with thirdparty relocations, rather than introduce another new Maven
>> > > module?
>> > >
>> > > Looking through your WIP PR too.
>> > >
>> > > On 4/7/21 1:10 AM, Istvan Toth wrote:
>> > > > Hi!
>> > > >
>> > > > I've been working on getting Phoenix working with
>> > > hbase-shaded-client.jar,
>> > > > and I am finally getting traction.
>> > > >
>> > > > One of the issues that I encountered is that we are mixing client
>> and
>> > > > server side code in phoenix-core, and there's a
>> > > > mutual interdependence between the two.
>> > > >
>> > > > Fixing this is not hard, as it's mostly about replacing
>> > .class.getName()
>> > > s
>> > > > with string constants, and moving around some inconveniently placed
>> > > static
>> > > > utility methods, and now I have a WIP version where the client side
>> > > doesn't
>> > > > depend on server classes.
>> > > >
>> > > > However, unless we change the project structure, and factor out the
>> > > classes
>> > > > that depend on server-side APIs, this will be extremely fragile, as
>> any
>> > > > change can (and will) re-introduce the circular dependency between
>> the
>> > > > classes.
>> > > >
>> > > > To solve this issue I propose the following:
>> > > >
>> > > >     - clean up phoenix-core, so that only classes that depend only
>> on
>> > > >     *hbase-client* (or at worst only on classes that are present in
>> > > >     *hbase-shaded-client*) remain. This should be 90+% of the code
>> > > >     - move all classes (mostly coprocessors and their support code)
>> > that
>> > > use
>> > > >     the server API (*hbase-server* mostly) to a new module, say
>> > > >     phoenix-coprocessors (the phoenix-server module name is taken).
>> > This
>> > > new
>> > > >     class depends on phoenix-core.
>> > > >     - move all classes that directly depend on MapReduce, and their
>> > > main()
>> > > >     classes to the existing phoenix-tools module (which also
>> depends on
>> > > core)
>> > > >
>> > > > The separation would be primarily based on API use, at the first cut
>> > I'd
>> > > be
>> > > > fine with keeping all logic phoenix-core, and referencing that. We
>> may
>> > or
>> > > > may not want to move logic that is only used in coprocessors or
>> tools,
>> > > but
>> > > > doesn't use the respective APIs to the new modules later.
>> > > >
>> > > > As for the main artifacts:
>> > > >
>> > > >     - *phoenix-server.jar* would include code from all three
>> classes.
>> > > >     - A newly added *phoenix-client-byo-shaded-hbase.jar *would
>> include
>> > > only
>> > > >     the code from cleaned-up phoenix-core
>> > > >     - Ideally, we'd remove the the tools and coprocessor code (and
>> > > >     dependencies) from the standard and embedded clients, and switch
>> > > >     documentation to use *phoenix-server* to run the MR tools, but
>> this
>> > > is
>> > > >     optional.
>> > > >
>> > > > I am tracking this work in PHOENIX-6053, which has a (currently
>> > working)
>> > > > WIP patch attached.
>> > > >
>> > > > I think that this change would fit the pattern established by
>> creating
>> > > the
>> > > > phoenix-tools module,
>> > > > but as this is major change in project structure (even if the actual
>> > Java
>> > > > changes are trivial),
>> > > > I'd like to gather your input on this approach (please also speak
>> up if
>> > > you
>> > > > agree).
>> > > >
>> > > > regards
>> > > > Istvan
>> > > >
>> > >
>> >
>>
>
>
> --
> *István Tóth* | Staff Software Engineer
> [email protected] <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> <https://www.cloudera.com/>
> ------------------------------
>

Reply via email to