[
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005265#comment-15005265
]
Jesus Camacho Rodriguez commented on HIVE-11531:
------------------------------------------------
[~huizane], thanks for working on this!
Agreed with the comments left by [~sershe]. In addition:
- This block in CalcitePlanner:
{code}
Integer offset =
qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null?
0:qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next()).getKey();
Integer limit =
qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null?
null:qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next()).getValue();
{code}
No need to create the iterator twice for offset (and for limit) and call next.
Could you cache the value to avoid unnecessary overhead and improve readability?
- A couple of additional style notes. There is typo in {{getOffetExpr}} method
in HiveSortLimit. We could rename old {{limit}} variable to {{fetch}} (for
instance, in line 2275 of CalcitePlanner) to avoid confusion. Please, pay
attention with code spacing (e.g. {{}else{}}).
- On a side note, maybe we should support the more verbose syntax too (as part
of this JIRA or a follow-up). In addition to the abbreviated syntax:
{noformat}
LIMIT (skip,)? n
{noformat}
that is mostly used by MySQL, we could support:
{noformat}
LIMIT n OFFSET skip
{noformat}
which is supported e.g. by MySQL and PostgreSQL.
[~sershe], Calcite should handle offset properly: support for offset was
already there. The only code that needs to be extended is in HIVE-11684; I will
update the patch accordingly so we do not ignore offset.
> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -----------------------------------------------------------------------------
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch,
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be
> paginated (which can be extremely large by itself). At present, ROW_NUMBER
> can be used to achieve this effect, but optimizations for LIMIT such as TopN
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for
> "skip" to existing limit, or improve ROW_NUMBER for better performance
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)