Istvan -- the mailing list stripped your attachment off, I believe :).

IIRC, Istvan's suggestion paves the way to make this (further) separation easier. With the changes he's proposing, we could further split the common module out into distinct pieces, and reduce what phoenix "server" requires.

On 4/18/21 9:13 PM, [email protected] wrote:
There is also another angle to look at. A long time ago I wrote this:

"
It seems Phoenix serves 4 distinct purposes:
1. Query parsing and compiling.
2. A type system
3. Query execution
4. Efficient HBase interface

Each of these is useful by itself, but we do not expose these as stable 
interfaces.
We have seen a lot of need to tie HBase into "higher level" service, such as 
Spark (and Presto, etc).
I think we can get a long way if we separate at least #1 (SQL) from the rest 
#2, #3, and #4 (Typed HBase Interface - THI).
Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, Spark, 
etc, can interface efficiently with HBase via THI (#2, #3, and #4).
"

I still believe this is an additional useful demarcation for how to group the 
code. And coincided somewhat with server/client.

Query parsing and the type system are client. Query execution and HBase 
interface are both client and server.

-- Lars

On Wednesday, April 14, 2021, 8:56:08 AM PDT, Istvan Toth <[email protected]> 
wrote:





Jacob, Josh and me had a discussion about the topic.

I'm attaching the dependency graph of the proposed modules



On Fri, Apr 9, 2021 at 6:30 AM Istvan Toth <[email protected]> wrote:
The bulk of the changes I'm working on is indeed the separation of the client 
and the server side code.

Separating the MR related classes, and the tools-specific code (main, options 
parsing, etc) makes sense to me, if we don't mind adding another module.

In the first WIP iteration, I'm splitting out everything that depends on more than 
hbase-client into a "server" module.
Once that works I will look at splitting that further into a  real "server" and an 
"MR/tools" module.


My initial estimates about splitting the server side code were way too 
optimistic, we have to touch a lot of code to break circular dependencies 
between the client and server side. The changes are still quite trivial, but 
the patch is going to be huge and scary.


Tests are also going to be a problem, we're probably going to have to move most of them into the 
"server" or a separate "tests" module, as the MiniCluster tests depend on code 
from each module.

The plan in PHOENIX-5483, and Lars's mail sounds good, but I think that it would be more 
about dividing the "client-side" module further.
(BTW I think that making the indexing engine available separately would also be 
a popular feature )



On Fri, Apr 9, 2021 at 5:39 AM Daniel Wong <[email protected]> wrote:
This is another project I am interested in as well as my group at
Salesforce.  We have had some discussions internally on this but I wasn't
aware of this specific Spark issue (We only allow phoenix access via spark
by default).  I think the approaches outlined are a good initial step but
we were also considering a larger breakup of phoenix-core.  I don't
think the desire for the larger step should stop us from doing the initial
ones Istavan and Josh proposed.  I think the high level plan makes sense
but I might prefer a different name than phoenix-tools for the ones we want
to be available to external libraries like phoenix-connectors.  Another
possible alternative is to restructure maybe less invasively by making
phoenix core like your proposed tools and making a phoenix-internal or
similar for the future.
One thing I was wondering was how much effort it was to split client/server
through phoenix-core...  Lars layed out a good component view of phoenix
whosethe first step might be PHOENIx-5483 but we could focus on highest
level separation rather than bottom up.  However, even that thread linked
there talks about a client-facing api which we can piggyback for this use.
Say phoeinx-public-api or similar.

On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <[email protected]> wrote:

Hi Josh & Istvan

Thanks Istvan for looking into this, I am also interested in solving this
problem,
Let me know how I can help?

Thanks
Jacob

On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <[email protected]> wrote:

Thanks for trying to tackle this sticky problem, Istvan. For the context
of everyone else, the real-life problem Istvan is trying to fix is that
you cannot run a Spark application with both HBase and Phoenix jars on
the classpath.

If I understand this correctly, it's that the HBase API signatures are
different depending on whether we are "client side" or "server side"
(within a RegionServer). Your comment on PHOENIX-6053 shows that
(signatures on Table.java around Protobuf's Service class having shaded
relocation vs. the original com.google.protobuf coordinates).

I think the reason we have the monolithic phoenix-core is that we have
so much logic which is executed on both the client and server side. For
example, we may push a filter operation to the server-side or we many
run it client-side. That's also why we have the "thin" phoenix-server
Maven module which just re-packages phoenix-core.

Is it possible that we change phoenix-server so that it contains the
"server-side" code that we don't want to have using the HBase classes
with thirdparty relocations, rather than introduce another new Maven
module?

Looking through your WIP PR too.

On 4/7/21 1:10 AM, Istvan Toth wrote:
Hi!

I've been working on getting Phoenix working with
hbase-shaded-client.jar,
and I am finally getting traction.

One of the issues that I encountered is that we are mixing client and
server side code in phoenix-core, and there's a
mutual interdependence between the two.

Fixing this is not hard, as it's mostly about replacing
.class.getName()
s
with string constants, and moving around some inconveniently placed
static
utility methods, and now I have a WIP version where the client side
doesn't
depend on server classes.

However, unless we change the project structure, and factor out the
classes
that depend on server-side APIs, this will be extremely fragile, as any
change can (and will) re-introduce the circular dependency between the
classes.

To solve this issue I propose the following:

      - clean up phoenix-core, so that only classes that depend only on
      *hbase-client* (or at worst only on classes that are present in
      *hbase-shaded-client*) remain. This should be 90+% of the code
      - move all classes (mostly coprocessors and their support code)
that
use
      the server API (*hbase-server* mostly) to a new module, say
      phoenix-coprocessors (the phoenix-server module name is taken).
This
new
      class depends on phoenix-core.
      - move all classes that directly depend on MapReduce, and their
main()
      classes to the existing phoenix-tools module (which also depends on
core)

The separation would be primarily based on API use, at the first cut
I'd
be
fine with keeping all logic phoenix-core, and referencing that. We may
or
may not want to move logic that is only used in coprocessors or tools,
but
doesn't use the respective APIs to the new modules later.

As for the main artifacts:

      - *phoenix-server.jar* would include code from all three classes.
      - A newly added *phoenix-client-byo-shaded-hbase.jar *would include
only
      the code from cleaned-up phoenix-core
      - Ideally, we'd remove the the tools and coprocessor code (and
      dependencies) from the standard and embedded clients, and switch
      documentation to use *phoenix-server* to run the MR tools, but this
is
      optional.

I am tracking this work in PHOENIX-6053, which has a (currently
working)
WIP patch attached.

I think that this change would fit the pattern established by creating
the
phoenix-tools module,
but as this is major change in project structure (even if the actual
Java
changes are trivial),
I'd like to gather your input on this approach (please also speak up if
you
agree).

regards
Istvan






--
István Tóth  | Staff Software Engineer

[email protected]
________________________________


Reply via email to