There is also another angle to look at. A long time ago I wrote this:

"
It seems Phoenix serves 4 distinct purposes:
1. Query parsing and compiling.
2. A type system
3. Query execution
4. Efficient HBase interface

Each of these is useful by itself, but we do not expose these as stable 
interfaces.
We have seen a lot of need to tie HBase into "higher level" service, such as 
Spark (and Presto, etc).
I think we can get a long way if we separate at least #1 (SQL) from the rest 
#2, #3, and #4 (Typed HBase Interface - THI).
Phoenix is used via SQL (#1), other tools such as Presto, Impala, Drill, Spark, 
etc, can interface efficiently with HBase via THI (#2, #3, and #4).
"

I still believe this is an additional useful demarcation for how to group the 
code. And coincided somewhat with server/client.

Query parsing and the type system are client. Query execution and HBase 
interface are both client and server.

-- Lars

On Wednesday, April 14, 2021, 8:56:08 AM PDT, Istvan Toth <[email protected]> 
wrote: 





Jacob, Josh and me had a discussion about the topic.

I'm attaching the dependency graph of the proposed modules



On Fri, Apr 9, 2021 at 6:30 AM Istvan Toth <[email protected]> wrote:
> The bulk of the changes I'm working on is indeed the separation of the client 
> and the server side code.
> 
> Separating the MR related classes, and the tools-specific code (main, options 
> parsing, etc) makes sense to me, if we don't mind adding another module.
> 
> In the first WIP iteration, I'm splitting out everything that depends on more 
> than hbase-client into a "server" module.
> Once that works I will look at splitting that further into a  real "server" 
> and an "MR/tools" module.
> 
> 
> My initial estimates about splitting the server side code were way too 
> optimistic, we have to touch a lot of code to break circular dependencies 
> between the client and server side. The changes are still quite trivial, but 
> the patch is going to be huge and scary.
> 
> 
> Tests are also going to be a problem, we're probably going to have to move 
> most of them into the "server" or a separate "tests" module, as the 
> MiniCluster tests depend on code from each module.
> 
> The plan in PHOENIX-5483, and Lars's mail sounds good, but I think that it 
> would be more about dividing the "client-side" module further. 
> (BTW I think that making the indexing engine available separately would also 
> be a popular feature )
> 
> 
> 
> On Fri, Apr 9, 2021 at 5:39 AM Daniel Wong <[email protected]> wrote:
>> This is another project I am interested in as well as my group at
>> Salesforce.  We have had some discussions internally on this but I wasn't
>> aware of this specific Spark issue (We only allow phoenix access via spark
>> by default).  I think the approaches outlined are a good initial step but
>> we were also considering a larger breakup of phoenix-core.  I don't
>> think the desire for the larger step should stop us from doing the initial
>> ones Istavan and Josh proposed.  I think the high level plan makes sense
>> but I might prefer a different name than phoenix-tools for the ones we want
>> to be available to external libraries like phoenix-connectors.  Another
>> possible alternative is to restructure maybe less invasively by making
>> phoenix core like your proposed tools and making a phoenix-internal or
>> similar for the future.
>> One thing I was wondering was how much effort it was to split client/server
>> through phoenix-core...  Lars layed out a good component view of phoenix
>> whosethe first step might be PHOENIx-5483 but we could focus on highest
>> level separation rather than bottom up.  However, even that thread linked
>> there talks about a client-facing api which we can piggyback for this use.
>> Say phoeinx-public-api or similar.
>> 
>> On Wed, Apr 7, 2021 at 9:43 AM Jacob Isaac <[email protected]> wrote:
>> 
>>> Hi Josh & Istvan
>>>
>>> Thanks Istvan for looking into this, I am also interested in solving this
>>> problem,
>>> Let me know how I can help?
>>>
>>> Thanks
>>> Jacob
>>>
>>> On Wed, Apr 7, 2021 at 9:05 AM Josh Elser <[email protected]> wrote:
>>>
>>> > Thanks for trying to tackle this sticky problem, Istvan. For the context
>>> > of everyone else, the real-life problem Istvan is trying to fix is that
>>> > you cannot run a Spark application with both HBase and Phoenix jars on
>>> > the classpath.
>>> >
>>> > If I understand this correctly, it's that the HBase API signatures are
>>> > different depending on whether we are "client side" or "server side"
>>> > (within a RegionServer). Your comment on PHOENIX-6053 shows that
>>> > (signatures on Table.java around Protobuf's Service class having shaded
>>> > relocation vs. the original com.google.protobuf coordinates).
>>> >
>>> > I think the reason we have the monolithic phoenix-core is that we have
>>> > so much logic which is executed on both the client and server side. For
>>> > example, we may push a filter operation to the server-side or we many
>>> > run it client-side. That's also why we have the "thin" phoenix-server
>>> > Maven module which just re-packages phoenix-core.
>>> >
>>> > Is it possible that we change phoenix-server so that it contains the
>>> > "server-side" code that we don't want to have using the HBase classes
>>> > with thirdparty relocations, rather than introduce another new Maven
>>> > module?
>>> >
>>> > Looking through your WIP PR too.
>>> >
>>> > On 4/7/21 1:10 AM, Istvan Toth wrote:
>>> > > Hi!
>>> > >
>>> > > I've been working on getting Phoenix working with
>>> > hbase-shaded-client.jar,
>>> > > and I am finally getting traction.
>>> > >
>>> > > One of the issues that I encountered is that we are mixing client and
>>> > > server side code in phoenix-core, and there's a
>>> > > mutual interdependence between the two.
>>> > >
>>> > > Fixing this is not hard, as it's mostly about replacing
>>> .class.getName()
>>> > s
>>> > > with string constants, and moving around some inconveniently placed
>>> > static
>>> > > utility methods, and now I have a WIP version where the client side
>>> > doesn't
>>> > > depend on server classes.
>>> > >
>>> > > However, unless we change the project structure, and factor out the
>>> > classes
>>> > > that depend on server-side APIs, this will be extremely fragile, as any
>>> > > change can (and will) re-introduce the circular dependency between the
>>> > > classes.
>>> > >
>>> > > To solve this issue I propose the following:
>>> > >
>>> > >     - clean up phoenix-core, so that only classes that depend only on
>>> > >     *hbase-client* (or at worst only on classes that are present in
>>> > >     *hbase-shaded-client*) remain. This should be 90+% of the code
>>> > >     - move all classes (mostly coprocessors and their support code)
>>> that
>>> > use
>>> > >     the server API (*hbase-server* mostly) to a new module, say
>>> > >     phoenix-coprocessors (the phoenix-server module name is taken).
>>> This
>>> > new
>>> > >     class depends on phoenix-core.
>>> > >     - move all classes that directly depend on MapReduce, and their
>>> > main()
>>> > >     classes to the existing phoenix-tools module (which also depends on
>>> > core)
>>> > >
>>> > > The separation would be primarily based on API use, at the first cut
>>> I'd
>>> > be
>>> > > fine with keeping all logic phoenix-core, and referencing that. We may
>>> or
>>> > > may not want to move logic that is only used in coprocessors or tools,
>>> > but
>>> > > doesn't use the respective APIs to the new modules later.
>>> > >
>>> > > As for the main artifacts:
>>> > >
>>> > >     - *phoenix-server.jar* would include code from all three classes.
>>> > >     - A newly added *phoenix-client-byo-shaded-hbase.jar *would include
>>> > only
>>> > >     the code from cleaned-up phoenix-core
>>> > >     - Ideally, we'd remove the the tools and coprocessor code (and
>>> > >     dependencies) from the standard and embedded clients, and switch
>>> > >     documentation to use *phoenix-server* to run the MR tools, but this
>>> > is
>>> > >     optional.
>>> > >
>>> > > I am tracking this work in PHOENIX-6053, which has a (currently
>>> working)
>>> > > WIP patch attached.
>>> > >
>>> > > I think that this change would fit the pattern established by creating
>>> > the
>>> > > phoenix-tools module,
>>> > > but as this is major change in project structure (even if the actual
>>> Java
>>> > > changes are trivial),
>>> > > I'd like to gather your input on this approach (please also speak up if
>>> > you
>>> > > agree).
>>> > >
>>> > > regards
>>> > > Istvan
>>> > >
>>> >
>>>
>> 
> 
> 
> -- 
> István Tóth  | Staff Software Engineer 
> 
> [email protected]  
>         
>  
> ________________________________ 
> 

Reply via email to