[jira] [Commented] (KYLIN-5879) use dataset iterator api to reduce driver memory use

pengfei.zhan (Jira) Sun, 14 Jul 2024 23:58:40 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865898#comment-17865898
 ]


pengfei.zhan commented on KYLIN-5879:
-------------------------------------

For JDBC, we use the http protocol to implement the underlying jdbc protocol, 
for BI, you can normally get data through the JDBC interface, in the 
acquisition of data in the ResultSet, we can modify the driver, once the data 
is not available, you can use the http protocol to pull the data of the next 
range, you need to bring the request ID, offset range for paging queries, the 
server side can be distributed through the task of querying the results of data 
segmentation, and then according to the offset limit to return segmented data.
Through the above approach can support the BI side of the large data volume of 
detail export scenarios, and to ensure that the KE will not be loaded because 
of the large amount of data and lead to memory burst.

h1. Limitation

For clients accessing directly through the restful interface, the paging 
parameter needs to be added.
jdbc and odbc need to be implemented separately.

> use dataset iterator api to reduce driver memory use
> ----------------------------------------------------
>
>                 Key: KYLIN-5879
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5879
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Query Engine
>    Affects Versions: 5.0-alpha
>            Reporter: pengfei.zhan
>            Assignee: pengfei.zhan
>            Priority: Major
>             Fix For: 5.0.0
>
>
> Full Scenario OLAP Service/Data Unification Service Outlet
> * The query result set should not reasonably have a Limit limit.
> * Scenarios answered by hierarchical storage/detailed indexing can support 
> large-scale data exporting
> * scenarios answered by query down, querying through JDBC provided by KE, the 
> performance is not significantly lower than using JDBC provided by the data 
> source directly, such as querying Hive JDBC directly.
> * the amount of data returned does not set a limit on the number of data 
> entries / or data size, because customer data is growing, the analysis of the 
> scene is not fixed.
> * customer behavior using data in a uniform manner, the current asynchronous 
> query in the data use scenarios have limitations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KYLIN-5879) use dataset iterator api to reduce driver memory use

Reply via email to