[ 
https://issues.apache.org/jira/browse/PHOENIX-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453854#comment-15453854
 ] 

James Taylor commented on PHOENIX-3224:
---------------------------------------

So in theory, a big join is similar to an ORDER BY, in that there'd be a pause 
while sorting is being done on both sides, but then you should start getting 
results flowing back to the client after that. If you do an aggregation, 
though, like a count( * ) on the joined results, then you wouldn't get an 
answer back until the client has seen all of the rows (which will take a long 
time as you mentioned, Lars). I think Phoenix would need a kind of shuffle step 
in it's processing model to handle this better.

> Observations from large scale testing.
> --------------------------------------
>
>                 Key: PHOENIX-3224
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3224
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: Lars Hofhansl
>
> We have a >1000 node physical cluster at our disposal for a short time, 
> before it'll be handed off to its intended use.
> Loaded a bunch of data (TPCs LINEITEM table, among others) and ran a bunch of 
> queries. Most tables are between 100G and 500G (uncompressed) and between 
> 600m and 2bn rows.
> The good news is that many things just worked. We sorted > 400G is < 5s with 
> HBase and Phoenix. Scans work. Joins work (as long as one side is kept under 
> 1m rows or so).
> For the issues we observers I'll file sub jiras under this.
> I'm going to write a lob post about this and attach a link here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to