[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410502#comment-15410502
 ] 

Michael J. Carey commented on ASTERIXDB-1556:
---------------------------------------------

I don't think step (4) makes sense or is needed.  If the sum of the space (D+H) 
exceeds the budget, invoke the algorithm's current spilling logic - end of 
change.  We needn't change the spilling policy itself, not logically - we just 
have to change the definition of "too full" to consider the space being used to 
be D+H instead of D alone.  The rest of the logic should remain unchanged.  
Anything more than that seems like unnecessary complexity.  (Not sure what it 
would accomplish.)  Steps (1)-(3) make perfect sense and sound good/right to me.

If you want to clean this up even more, budget-wise, perhaps you could slightly 
change the logic to first ask the Hash Table how many frames it would need to 
add one entry.  Its answer could be 0 (which would almost always be the case), 
1, or 2.  You could then pass that info in to the Data Table buffer manager 
(i.e., tell it how big the insert will cause the total amount of HT space to 
be) so that it knows what the total impact of the operation would be on space 
used - and then it could make the more global decision itself.

Could you draw a picture of how memory is used when all this is happening and 
put it in the docs somewhere?  One think I am uncertain about is how memory 
looks with multiple partitions, and I would like to be sure we've got things 
under proper control in that respect.  (I am wondering how things are set up to 
make spilling fairly efficient/painless.)


> Prefix-based multi-way Fuzzy-join generates an exception.
> ---------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 
> 2), the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is 
> translated into massive number of operators (more than 200 operators in the 
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to