[ 
https://issues.apache.org/jira/browse/PHOENIX-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453323#comment-15453323
 ] 

Maryann Xue commented on PHOENIX-3224:
--------------------------------------

The sort-merge join takes inputs from both sides of the join in a streaming 
fashion, so the performance should be close to sorting both tables. There are 
chances though that sort merge join would have to cache a lot of data on the 
client side and perform really badly. It is when there are a big amount of rows 
from both sides that have the same join keys. In that case there will be 
caching and backtracking to cross join all those rows.

> Observations from large scale testing.
> --------------------------------------
>
>                 Key: PHOENIX-3224
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3224
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: Lars Hofhansl
>
> We have a >1000 node physical cluster at our disposal for a short time, 
> before it'll be handed off to its intended use.
> Loaded a bunch of data (TPCs LINEITEM table, among others) and ran a bunch of 
> queries. Most tables are between 100G and 500G (uncompressed) and between 
> 600m and 2bn rows.
> The good news is that many things just worked. We sorted > 400G is < 5s with 
> HBase and Phoenix. Scans work. Joins work (as long as one side is kept under 
> 1m rows or so).
> For the issues we observers I'll file sub jiras under this.
> I'm going to write a lob post about this and attach a link here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to