[
https://issues.apache.org/jira/browse/PHOENIX-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453854#comment-15453854
]
James Taylor commented on PHOENIX-3224:
---------------------------------------
So in theory, a big join is similar to an ORDER BY, in that there'd be a pause
while sorting is being done on both sides, but then you should start getting
results flowing back to the client after that. If you do an aggregation,
though, like a count( * ) on the joined results, then you wouldn't get an
answer back until the client has seen all of the rows (which will take a long
time as you mentioned, Lars). I think Phoenix would need a kind of shuffle step
in it's processing model to handle this better.
> Observations from large scale testing.
> --------------------------------------
>
> Key: PHOENIX-3224
> URL: https://issues.apache.org/jira/browse/PHOENIX-3224
> Project: Phoenix
> Issue Type: Task
> Reporter: Lars Hofhansl
>
> We have a >1000 node physical cluster at our disposal for a short time,
> before it'll be handed off to its intended use.
> Loaded a bunch of data (TPCs LINEITEM table, among others) and ran a bunch of
> queries. Most tables are between 100G and 500G (uncompressed) and between
> 600m and 2bn rows.
> The good news is that many things just worked. We sorted > 400G is < 5s with
> HBase and Phoenix. Scans work. Joins work (as long as one side is kept under
> 1m rows or so).
> For the issues we observers I'll file sub jiras under this.
> I'm going to write a lob post about this and attach a link here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)