[
https://issues.apache.org/jira/browse/PHOENIX-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497617#comment-14497617
]
James Taylor commented on PHOENIX-1779:
---------------------------------------
I'd also get rid of currentIterator() and just combine it with next(). Maybe
something like this:
{code}
+ @Override
+ public Tuple next() throws SQLException {
+ List<RoundRobinIteratorState> iterators;
+ while ((iterators = getIterators()).size() > 0) {
+ index = index % size;
+ RoundRobinIteratorState itrState = iterators.get(index);
+ PeekingResultIterator itr = itrState.iterator;
+ /*
+ * Pick up the iterator only if it is open and if it hasn't
already fetched more than the scanner cache size
+ * of records.
+ */
+ if (itrState.numRecordsRead >= threshold) {
+ index = (index + 1) % size;
+ } else {
+ Tuple tuple = null;
+ if ((tuple = itrState.tuple) != null || (tuple = itr.peek())
!= null) {
+ itrState.tuple = null;
+ itrState.numRecordsRead++;
+ index = (index + 1) % size;
+ if (itrState.numRecordsRead == threshold) {
+ numScannersCacheExhausted++;
+ }
+ return tuple;
+ }
+ // The scanner is exhausted and no more records will be
returned by it. Un-track and close iterator.
+ itr.close();
+ iterators.remove(index);
+ }
+ return null;
+ }
{code}
> Parallelize fetching of next batch of records for scans corresponding to
> queries with no order by
> --------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-1779
> URL: https://issues.apache.org/jira/browse/PHOENIX-1779
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Samarth Jain
> Assignee: Samarth Jain
> Attachments: PHOENIX-1779.patch, PHOENIX-1779_v2.patch, wip.patch,
> wip3.patch, wipwithsplits.patch
>
>
> Today in Phoenix we parallelize the first execution of scans i.e. we load
> only the first batch of records up to the scan's cache size in parallel.
> Loading of subsequent batches of records in scanners is essentially serial.
> This could be improved especially for queries, including the ones with no
> order by clauses, that do not need any kind of merge sort on the client.
> This could also potentially improve the performance of UPSERT SELECT
> statements that load data from one table and insert into another. One such
> use case being creating immutable indexes for tables that already have data.
> It could also potentially improve the performance of our MapReduce solution
> for bulk loading data by improving the speed of the loading/mapping phase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)