[
https://issues.apache.org/jira/browse/PHOENIX-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702394#comment-16702394
]
Josh Elser commented on PHOENIX-5044:
-------------------------------------
{quote}UPSERT/SELECT also seems to be faster - but of course that depends on
the network between the client and the server as well as the size of the update.
{quote}
Can you ballpark the sizes of what you're using to experiment? I can appreciate
the simplification of this, but I'm having a hard time believing that pushing
it through the client is always faster ;) (based on the limited comments
above). Didn't mutable index rebuilding use UPSERT-SELECTs, or is that also a
goal of your "pruning" that you're trying to work out, Lars? I would worry
that, with mutable secondary indexes are they currently exist, this would
dramatically increase the load on the MetadataEndpointCP as well as
dramatically increase the amount of time to rebuild an index (likely making the
automatic rebuild impossible for any modest data sizes).
Curious if [~sergey.soldatov] has any cycles to drop a comment, too.
Giving my impression from high in the sky: I think trying to support a massive
upsert-select within HBase itself is an anti-pattern. HBase isn't designed to
support that, and I think that's largely what you're trying to move us away
from, Lars. My big concern would be that, as we do remove this (and other)
anti-patterns, what is the tool of choice when users need to move large
"swaths" of data in Phoenix? e.g. could I give Phoenix my UPSERT SELECT
statement and have it executed as a mapreduce job to avoid bringing back 10TB
of data to a client?
> Remove server side mutation code from Phoenix
> ---------------------------------------------
>
> Key: PHOENIX-5044
> URL: https://issues.apache.org/jira/browse/PHOENIX-5044
> Project: Phoenix
> Issue Type: Task
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Priority: Major
> Attachments: 5044-looksee-v2.txt, 5044-looksee-v3.txt,
> 5044-looksee.txt
>
>
> This is for *discussion*. Perhaps controversial.
> It generally seems to be a bad - if well-intentioned - idea to trigger
> mutations directly from the server. The main causes are UPSERT SELECT for the
> same table and DELETE FROM.
> IMHO, it's generally better to allow the client to handle this. There might
> be larger network overhead, but we get better chunking, better pacing, and
> behavior more in line with how HBase was intended to work.
> In PHOENIX-5026 I introduced a flag to disable server triggered mutations in
> the two cases mentioned above. I now think it's better to just remove the
> server code and also perform these from the client.
> (Note that server side reads - aggregation, filters, etc - are still insanely
> valuable and not affected by this)
> Let's discuss.
> [~tdsilva], [[email protected]], [~jamestaylor], [~vincentpoon], [~gjacoby]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)