[jira] [Comment Edited] (ASTERIXDB-1556) Prefix-based multi-way Fuzzy-join generates an exception.

Taewoo Kim (JIRA) Thu, 04 Aug 2016 21:17:45 -0700

    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408840#comment-15408840
 ]


Taewoo Kim edited comment on ASTERIXDB-1556 at 8/5/16 4:16 AM:
---------------------------------------------------------------

My analysis so far on external-groupby:

The hash table consists of headers frame + content frame (stores the tuple 
pointer for the real tuple). Both Headers and content frame can be 
incrementally allocated though the maximum number of header frame is limited. 
That is equivalent to the "initial entry size in bytes * 2 / frame size). The 
number of the content frames can be increased indefinitely. 

The data table is bounded by the number of limit that is calculated from the 
user configuration setting. So, once an insertion to the data table is failed, 
a partition is spilled to the disk. In this case, currently, we reset the 
corresponding entries in the hash table. 

So, we need to set up a policy regarding the proportion between the hash table 
and the data table. And, allocating a frame for hash table or allocating a 
frame for the data table fails, the spill should happen.

{quote}
(3) We need to come up with a strategy. Possible choices are: 1) Data and 
hash-table dynamically share the entire budget. 2) have a global budget, and 
let DATA and HASH-TABLE have pre-defined proportion (e.g., data -80%, hash 
table - 20%). Do not let each overgrow beyond the proportion. 3) have a two 
separate budget and let DATA and HASH-TABLE stick to it.
{quote} 

So, how much percentage should we allocate for hash table and how much for data 
table, at least initially since we need to decide the number of partitions in 
the data table and hash table. 


was (Author: wangsaeu):
My analysis so far on external-groupby:

The hash table consists of headers frame + content frame (stores the tuple 
pointer for the real tuple). Both Headers and content frame can be allocated 
though the maximum number of header frame is limited. That is equivalent to the 
"initial entry size in bytes * 2 / frame size). Content frame can be increased 
indefinitely. 

The data table is bounded by the number of limit that is calculated from the 
user configuration setting. So, once an insertion to the data table is failed, 
a partition is spilled to the disk. In this case, currently, we reset the 
corresponding entries in the hash table. 

So, we need to set up a policy regarding the proportion between the hash table 
and the data table. And, allocating a frame for hash table or allocating a 
frame for the data table fails, the spill should happen.

{quote}
(3) We need to come up with a strategy. Possible choices are: 1) Data and 
hash-table dynamically share the entire budget. 2) have a global budget, and 
let DATA and HASH-TABLE have pre-defined proportion (e.g., data -80%, hash 
table - 20%). Do not let each overgrow beyond the proportion. 3) have a two 
separate budget and let DATA and HASH-TABLE stick to it.
{quote} 

So, how much percentage should we allocate for hash table and how much for data 
table, at least initially since we need to decide the number of partitions in 
the data table and hash table. 

> Prefix-based multi-way Fuzzy-join generates an exception.
> ---------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 
> 2), the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is 
> translated into massive number of operators (more than 200 operators in the 
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ASTERIXDB-1556) Prefix-based multi-way Fuzzy-join generates an exception.

Reply via email to