[
https://issues.apache.org/jira/browse/PHOENIX-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314638#comment-17314638
]
ASF GitHub Bot commented on PHOENIX-6436:
-----------------------------------------
lhofhansl opened a new pull request #1189:
URL: https://github.com/apache/phoenix/pull/1189
See Jira.
the OrderedResultIterator will store topN values in either a BufferedQueue
or a SizeBoundQueue. Each limit the memory used to the passed thresholdBytes
(default is 20MB), the BufferQueue will spool to disk when reaching that size,
SizeBoundQueue will fail.
Hence we additionally limit the worst case memory consumption to
thresholdBytes.
I noticed this when implementing limit and topN pushdown for the Trino
Phoenix connector: https://github.com/trinodb/trino/pull/7490
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> OrderedResultIterator overestimates memory requirements.
> --------------------------------------------------------
>
> Key: PHOENIX-6436
> URL: https://issues.apache.org/jira/browse/PHOENIX-6436
> Project: Phoenix
> Issue Type: Wish
> Reporter: Lars Hofhansl
> Priority: Major
>
> Just came across this.
> The size estimation is: {{(limit + offset) * estimatedEntrySize}}
> with just the passed limit and offset, and this estimate is applied for each
> single scan.
> This is way too pessimistic when a large limit is passed as just a safety
> measure.
> Assuming you pass 10.000.000. That is the overall limit, but Phoenix will
> apply it to every scan (at least one per involved region) and take that much
> memory of the pool.
> Not sure what a better estimate would be. Ideally we'd divide by the number
> of involved regions with some fuss, or use a size estimate of the region.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)