richardstartin opened a new pull request, #8570:
URL: https://github.com/apache/pinot/pull/8570

   1. Replace `LinkedList` with `ArrayDeque` for storing search entries in the 
bread-first traversal.
   2. Delay deserialization of most of the `OffHeapStarTreeNode` fields until 
the fields are used, because most are never used during a traversal.
   
   Change 1 this is motivated by a profile of a StarTree over a high 
cardinality dimension where the `LinkedList` is of a size large enough to cause 
problems, resulting in a large number of (doubly-linked) `LinkedList$Node` 
objects allocated. `ArrayDeque` just stores the values in an array which it 
maintains as a circular buffer
   <img width="1027" alt="Screenshot 2022-04-20 at 13 31 00" 
src="https://user-images.githubusercontent.com/16439049/164230787-6c304c49-2601-4209-b98e-6d11242e3532.png";>
   
   Change 2 is motivated by a observing a lot of samples being taken within 
`OffHeapStarTreeNode$1.next` 
   <img width="847" alt="Screenshot 2022-04-20 at 13 35 01" 
src="https://user-images.githubusercontent.com/16439049/164231624-fbdeb8e7-b05f-43a6-8472-1c9e8d7fc885.png";>
   
   There is also a high allocation rate of `OffHeapStarTreeNode` objects:
   <img width="1266" alt="Screenshot 2022-04-20 at 13 36 48" 
src="https://user-images.githubusercontent.com/16439049/164231771-1f9eed17-8860-48aa-9233-5b8fe18818de.png";>
   
   Reducing the number of fields reduces the size of instances when compressed 
references are enabled by a third: 
   
   before:
   ```
   org.apache.pinot.segment.local.startree.OffHeapStarTreeNode object internals:
   OFF  SZ                                                  TYPE DESCRIPTION    
                        VALUE
     0   8                                                       (object 
header: mark)                  N/A
     8   4                                                       (object 
header: class)                 N/A
    12   4                                                   int 
OffHeapStarTreeNode._dimensionId       N/A
    16   4                                                   int 
OffHeapStarTreeNode._dimensionValue    N/A
    20   4                                                   int 
OffHeapStarTreeNode._startDocId        N/A
    24   4                                                   int 
OffHeapStarTreeNode._endDocId          N/A
    28   4                                                   int 
OffHeapStarTreeNode._aggregatedDocId   N/A
    32   4                                                   int 
OffHeapStarTreeNode._firstChildId      N/A
    36   4                                                   int 
OffHeapStarTreeNode._lastChildId       N/A
    40   4   org.apache.pinot.segment.spi.memory.PinotDataBuffer 
OffHeapStarTreeNode._dataBuffer        N/A
    44   4                                                       (object 
alignment gap)                 
   Instance size: 48 bytes
   Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
   ```
   
   after:
   ```
   org.apache.pinot.segment.local.startree.OffHeapStarTreeNode object internals:
   OFF  SZ                                                  TYPE DESCRIPTION    
                     VALUE
     0   8                                                       (object 
header: mark)               N/A
     8   4                                                       (object 
header: class)              N/A
    12   4                                                   int 
OffHeapStarTreeNode._nodeId         N/A
    16   4                                                   int 
OffHeapStarTreeNode._firstChildId   N/A
    20   4                                                   int 
OffHeapStarTreeNode._lastChildId    N/A
    24   4   org.apache.pinot.segment.spi.memory.PinotDataBuffer 
OffHeapStarTreeNode._dataBuffer     N/A
    28   4                                                       (object 
alignment gap)              
   Instance size: 32 bytes
   Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
   ```
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to