[jira] [Commented] (PHOENIX-1779) Parallelize fetching of next batch of records for scans corresponding to queries with no order by

Samarth Jain (JIRA) Thu, 26 Mar 2015 17:11:27 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383011#comment-14383011
 ]


Samarth Jain commented on PHOENIX-1779:
---------------------------------------

I verified manually that queries and upsert selects are working fine. Of course 
that isn't in anyway sufficient. However, modifying the existing tests to 
handle this new condition where rows are not in the row key order anymore is 
turning out to be a HUGE pain :). I thought I could simply use 
BaseTest#assertValuesEqualsResultSet to verify if that correct rows were 
returned in the result set but it turned out to be pretty limiting. That method 
essentially relies on object.equals() to verify if correct column values were 
returned which isn't always the right thing to do. For ex - rs.getLong() 
returns 0 although rs.getObject() returns null. 

I am going to give our perf.py script a shot now and see what gains are we 
looking at. 

> Parallelize fetching of next batch of records for scans corresponding to 
> queries with no order by 
> --------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1779
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1779
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Samarth Jain
>            Assignee: Samarth Jain
>         Attachments: wip.patch
>
>
> Today in Phoenix we parallelize the first execution of scans i.e. we load 
> only the first batch of records up to the scan's cache size in parallel. 
> Loading of subsequent batches of records in scanners is essentially serial. 
> This could be improved especially for queries, including the ones with no 
> order by clauses,  that do not need any kind of merge sort on the client. 
> This could also potentially improve the performance of UPSERT SELECT 
> statements that load data from one table and insert into another. One such 
> use case being creating immutable indexes for tables that already have data. 
> It could also potentially improve the performance of our MapReduce solution 
> for bulk loading data by improving the speed of the loading/mapping phase. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1779) Parallelize fetching of next batch of records for scans corresponding to queries with no order by

Reply via email to