[
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203122#comment-15203122
]
Hao Chen commented on KYLIN-1506:
---------------------------------
[[email protected]] could you please help review my patches as well?
> Refactor resource interface for timeseries-based data like jobs to much
> better performance
> ------------------------------------------------------------------------------------------
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v1.5.0, v1.4.0, v1.3.0
> Reporter: Hao Chen
> Assignee: Hao Chen
> Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter:
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long
> timeEndInMillis) throws PersistentException {
> try {
> NavigableSet<String> resources =
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd,
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class,
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List<ExecutableOutputPO> getJobOutputs(long
> timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long
> timeEndInMillis) throws PersistentException {
> try {
> return
> store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT,
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class,
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)