[jira] [Commented] (PHOENIX-1779) Parallelize fetching of next batch of records for scans corresponding to queries with no order by

James Taylor (JIRA) Wed, 15 Apr 2015 23:21:44 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497617#comment-14497617
 ]


James Taylor commented on PHOENIX-1779:
---------------------------------------

I'd also get rid of currentIterator() and just combine it with next(). Maybe 
something like this:
{code}
+    @Override
+    public Tuple next() throws SQLException {
+        List<RoundRobinIteratorState> iterators;
+        while ((iterators = getIterators()).size() > 0) {
+            index = index % size;
+            RoundRobinIteratorState itrState = iterators.get(index);
+            PeekingResultIterator itr = itrState.iterator;
+            /*
+             * Pick up the iterator only if it is open and if it hasn't 
already fetched more than the scanner cache size
+             * of records.
+             */
+            if (itrState.numRecordsRead >= threshold) {
+                index = (index + 1) % size;
+            } else {
+                Tuple tuple = null;
+                if ((tuple = itrState.tuple) != null || (tuple = itr.peek()) 
!= null) {
+                    itrState.tuple = null;
+                    itrState.numRecordsRead++;
+                    index = (index + 1) % size;
+                    if (itrState.numRecordsRead == threshold) {
+                        numScannersCacheExhausted++;
+                    }
+                    return tuple;
+                }
+                // The scanner is exhausted and no more records will be 
returned by it. Un-track and close iterator.
+                itr.close();
+                iterators.remove(index);
+        }
+        return null;
+    }
{code}

> Parallelize fetching of next batch of records for scans corresponding to 
> queries with no order by 
> --------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1779
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1779
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>         Attachments: PHOENIX-1779.patch, PHOENIX-1779_v2.patch, wip.patch, 
> wip3.patch, wipwithsplits.patch
>
>
> Today in Phoenix we parallelize the first execution of scans i.e. we load 
> only the first batch of records up to the scan's cache size in parallel. 
> Loading of subsequent batches of records in scanners is essentially serial. 
> This could be improved especially for queries, including the ones with no 
> order by clauses,  that do not need any kind of merge sort on the client. 
> This could also potentially improve the performance of UPSERT SELECT 
> statements that load data from one table and insert into another. One such 
> use case being creating immutable indexes for tables that already have data. 
> It could also potentially improve the performance of our MapReduce solution 
> for bulk loading data by improving the speed of the loading/mapping phase. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1779) Parallelize fetching of next batch of records for scans corresponding to queries with no order by

Reply via email to