[
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411087#comment-15411087
]
Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------
The structure of SpillableTale used in external hash group-by:
https://docs.google.com/presentation/d/1AExoTqQlx9va-AaiZ6OSPxBuQ3NJqz-cG5NGrjdk5FU/edit?usp=sharing
A spillableTable consists of Hash Table + Data Table.
1) Hash Table size (# of unique h() values): N (currently, it's just always
104,865,767).
2) Data Table size: D (#bytes in the user configuration / framesize in the user
configuration) (e.g., 32MB / 32768 = 1,024)
3) # of partitions in Data Table: currently D (it has an equation: filesize /
framesize. But, eventually it's D since filesize is given by maxframe *
framesize)
4) # entries per partition: N / D
Hash Table consists of three layers (header pointer array -> header frame ->
content frame ----> actual tuples in Data table).
# of entries in header pointer array: NHPA is given by (N * 8(2 int) /
framesize).
For example, if N is 1000 and framesize is 256, then 1000 * 8 / 256 = 32. There
will be 32 entries in the hash pointer array.
Once we get a h() value for a tuple t0, we also can find that h() value belongs
to which location of header pointer array since we know that each entry in the
hash pointer array can accommodate "framesize / 8 (two int)" contiguous hash
values. So, once we get the location of the header pointer array, we check
whether a header frame is allocated for the location. If not, we allocate one
frame. So, finally, the header pointer array can have NHPA frames at maximum.
This frame allocation happens incrementally.
The corresponding header frame has a pointer to the content frame that contains
actually hash slot for the tuplepointers that shares the same h() value.
Inside the content frame, there is a hash slot that has meta-data information
(capacity + #used count) and tuplepointers that are stored in Data Table.
When we check the header frame, if no content frame is allocated, then we
allocate a content frame. This also happens incrementally.
Inserting a tuple t0
1. Get h() value for t0.
2. We check whether Hash Table already contains the hash slot for h(). If so,
fetch all entries (tuple pointer) in the hash slot and check whether the key
field value of t0 already exists.
YES: call aggregateExistingTuple().
NO: insert t0 to Data Table first.
- a. calculate the target partition number by "h() value / #entries per
partition".
(e.g., if there can be 10 entries per partition and h() is 15, then it
goes to partition 1.)
- b. tries to insert t0 to the target partition. If the partition doesn't
have an allocated frame, it allocates one and insert t0 to there.
If there is not enough space in any frame of the current partition and
it can't allocate more frame from the frame pool (since all frames are
allocated), it tries to find a victim partition and spill that partition to the
disk. Also, reset the corresponding slot for the partition in Hash Table for
the future uses for the same h() value. And it repeats this process until the
insertion process for t0 is successful.
- c. Insert the tuple pointer of t0 (frame index, offset) into Hash Table.
First, it checks the header array of Hash Table. If the header array doesn't
contain the corresponding header frame for h() value of t0, then a new header
frame is allocated. It checks the header frame and gets the slot location for
h() value in the content frame.
- d. In the content frame location provided by the header frame, there is
the slot for the entries whose h() value is the same as t0. The structure is
[capacity][#used count][frameIndex1][offset1]...[frameIndexN][offsetN]. If
capacity - #usedcount is greater than 0, we insert tuplepointer of t0 in the
empty spot in the slot. If not, increase the capacity by two (e.g., 4, 8, 16,
...) and find a new space that can accommodate this size in the content frame.
If the latest content frame doesn't have enough space, allocate one more frame.
And migrate the current slot to there. Also update the pointer in the header
frame. The old space in the content frame will not be used again.
> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf,
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( >
> 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is
> translated into massive number of operators (more than 200 operators in the
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external
> hash group by doesn't conform to the frame limit. So, an out of memory
> exception happens during the execution of an external hash group by operator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)