[
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403898#comment-15403898
]
Taewoo Kim edited comment on ASTERIXDB-1556 at 8/2/16 5:01 PM:
---------------------------------------------------------------
It seems that the compiler doesn't set the hash table size for
external-group-by, in-memory-hash-join, and hash-group-by during
APIFramework.compileQuery(). Only the following settings are applied.
{code:title=APIFramework.java|borderStyle=solid}
AsterixCompilerProperties compilerProperties =
AsterixAppContextInfo.getInstance().getCompilerProperties();
int frameSize = compilerProperties.getFrameSize();
int sortFrameLimit = (int) (compilerProperties.getSortMemorySize() /
frameSize);
int groupFrameLimit = (int) (compilerProperties.getGroupMemorySize() /
frameSize);
int joinFrameLimit = (int) (compilerProperties.getJoinMemorySize() /
frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setFrameSize(frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalSort(sortFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalGroupBy(groupFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesForJoin(joinFrameLimit);
{code}
Here, the number of frame limit is set. However, the hash table size is always
set to 10,485,767 based on the following setting in
PhysicalOptimizationConfig().
{code:title=PhysicalOptimizationConfig.java|borderStyle=solid}
public PhysicalOptimizationConfig() {
int frameSize = 32768;
setInt(FRAMESIZE, frameSize);
setInt(MAX_FRAMES_EXTERNAL_SORT, (int) (((long) 32 * MB) / frameSize));
setInt(MAX_FRAMES_EXTERNAL_GROUP_BY, (int) (((long) 32 * MB) /
frameSize));
// use http://www.rsok.com/~jrm/printprimes.html to find prime numbers
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, 10485767);
}
{code}
Though we have three methods that can change the default table size, there are
no callers for these methods.
{code:title=PhysicalOptimizationConfig.java|borderStyle=solid}
public void setExternalGroupByTableSize(int tableSize) {
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, tableSize);
}
public void setInMemHashJoinTableSize(int tableSize) {
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, tableSize);
}
public void setHashGroupByTableSize(int tableSize) {
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, tableSize);
}
{code}
I checked the hybrid-hash join part and it seems that The callers that create a
hash table in the join part is well adjusted based on the number of tuples
(file sizes). But, for Group-by, there is no such setting. So, the
HashSpillableTableFactory.buildSpillableTable() always set the 8 (INT_SIZE * 2)
times of the table size (10,485,767), which is 8 * 10,485,767 = 80MB. So, for
example, if we have 8 partitions, then engine always assigns 80 * 8 = 640MB for
each group-by operator.
{code:title=SerializableHashTable.java|borderStyle=solid}
public SerializableHashTable(int tableSize, final IHyracksFrameMgrContext
ctx) throws HyracksDataException {
this.ctx = ctx;
int frameSize = ctx.getInitialFrameSize();
int residual = tableSize * INT_SIZE * 2 % frameSize == 0 ? 0 : 1;
int headerSize = tableSize * INT_SIZE * 2 / frameSize + residual;
headers = new IntSerDeBuffer[headerSize];
IntSerDeBuffer frame = new IntSerDeBuffer(ctx.allocateFrame().array());
contents.add(frame);
frameCurrentIndex.add(0);
frameCapacity = frame.capacity();
}
{code}
was (Author: wangsaeu):
It seems that the compiler doesn't set the hash table size for
external-group-by, in-memory-hash-join, and hash-group-by during
APIFramework.compileQuery(). Only the following settings are applied.
AsterixCompilerProperties compilerProperties =
AsterixAppContextInfo.getInstance().getCompilerProperties();
int frameSize = compilerProperties.getFrameSize();
int sortFrameLimit = (int) (compilerProperties.getSortMemorySize() /
frameSize);
int groupFrameLimit = (int) (compilerProperties.getGroupMemorySize() /
frameSize);
int joinFrameLimit = (int) (compilerProperties.getJoinMemorySize() /
frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setFrameSize(frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalSort(sortFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalGroupBy(groupFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesForJoin(joinFrameLimit);
Here, the number of frame limit is set. However, the hash table size is always
set to 10,485,767 based on the following setting in
PhysicalOptimizationConfig().
public PhysicalOptimizationConfig() {
int frameSize = 32768;
setInt(FRAMESIZE, frameSize);
setInt(MAX_FRAMES_EXTERNAL_SORT, (int) (((long) 32 * MB) / frameSize));
setInt(MAX_FRAMES_EXTERNAL_GROUP_BY, (int) (((long) 32 * MB) /
frameSize));
// use http://www.rsok.com/~jrm/printprimes.html to find prime numbers
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, 10485767);
}
Though we have three methods that can change the default table size, there are
no callers for these methods.
public void setExternalGroupByTableSize(int tableSize) {
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, tableSize);
}
public void setInMemHashJoinTableSize(int tableSize) {
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, tableSize);
}
public void setHashGroupByTableSize(int tableSize) {
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, tableSize);
}
I checked the hybrid-hash join part and it seems that The callers that create a
hash table in the join part is well adjusted based on the number of tuples
(file sizes). But, for Group-by, there is no such setting. So, the
HashSpillableTableFactory.buildSpillableTable() always set the 8 (INT_SIZE * 2)
times of the table size (10,485,767), which is 8 * 10,485,767 = 80MB. So, for
example, if we have 8 partitions, then engine always assigns 80 * 8 = 640MB for
each group-by operator.
public SerializableHashTable(int tableSize, final IHyracksFrameMgrContext
ctx) throws HyracksDataException {
this.ctx = ctx;
int frameSize = ctx.getInitialFrameSize();
int residual = tableSize * INT_SIZE * 2 % frameSize == 0 ? 0 : 1;
int headerSize = tableSize * INT_SIZE * 2 / frameSize + residual;
headers = new IntSerDeBuffer[headerSize];
IntSerDeBuffer frame = new IntSerDeBuffer(ctx.allocateFrame().array());
contents.add(frame);
frameCurrentIndex.add(0);
frameCapacity = frame.capacity();
}
> Prefix-based multi-way Fuzzy-join generates an exception.
> ---------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf,
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( >
> 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is
> translated into massive number of operators (more than 200 operators in the
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)