Thanks for trying to tackle this sticky problem, Istvan. For the context
of everyone else, the real-life problem Istvan is trying to fix is that
you cannot run a Spark application with both HBase and Phoenix jars on
the classpath.
If I understand this correctly, it's that the HBase API signatures are
different depending on whether we are "client side" or "server side"
(within a RegionServer). Your comment on PHOENIX-6053 shows that
(signatures on Table.java around Protobuf's Service class having shaded
relocation vs. the original com.google.protobuf coordinates).
I think the reason we have the monolithic phoenix-core is that we have
so much logic which is executed on both the client and server side. For
example, we may push a filter operation to the server-side or we many
run it client-side. That's also why we have the "thin" phoenix-server
Maven module which just re-packages phoenix-core.
Is it possible that we change phoenix-server so that it contains the
"server-side" code that we don't want to have using the HBase classes
with thirdparty relocations, rather than introduce another new Maven module?
Looking through your WIP PR too.
On 4/7/21 1:10 AM, Istvan Toth wrote:
Hi!
I've been working on getting Phoenix working with hbase-shaded-client.jar,
and I am finally getting traction.
One of the issues that I encountered is that we are mixing client and
server side code in phoenix-core, and there's a
mutual interdependence between the two.
Fixing this is not hard, as it's mostly about replacing .class.getName() s
with string constants, and moving around some inconveniently placed static
utility methods, and now I have a WIP version where the client side doesn't
depend on server classes.
However, unless we change the project structure, and factor out the classes
that depend on server-side APIs, this will be extremely fragile, as any
change can (and will) re-introduce the circular dependency between the
classes.
To solve this issue I propose the following:
- clean up phoenix-core, so that only classes that depend only on
*hbase-client* (or at worst only on classes that are present in
*hbase-shaded-client*) remain. This should be 90+% of the code
- move all classes (mostly coprocessors and their support code) that use
the server API (*hbase-server* mostly) to a new module, say
phoenix-coprocessors (the phoenix-server module name is taken). This new
class depends on phoenix-core.
- move all classes that directly depend on MapReduce, and their main()
classes to the existing phoenix-tools module (which also depends on core)
The separation would be primarily based on API use, at the first cut I'd be
fine with keeping all logic phoenix-core, and referencing that. We may or
may not want to move logic that is only used in coprocessors or tools, but
doesn't use the respective APIs to the new modules later.
As for the main artifacts:
- *phoenix-server.jar* would include code from all three classes.
- A newly added *phoenix-client-byo-shaded-hbase.jar *would include only
the code from cleaned-up phoenix-core
- Ideally, we'd remove the the tools and coprocessor code (and
dependencies) from the standard and embedded clients, and switch
documentation to use *phoenix-server* to run the MR tools, but this is
optional.
I am tracking this work in PHOENIX-6053, which has a (currently working)
WIP patch attached.
I think that this change would fit the pattern established by creating the
phoenix-tools module,
but as this is major change in project structure (even if the actual Java
changes are trivial),
I'd like to gather your input on this approach (please also speak up if you
agree).
regards
Istvan