[jira] [Comment Edited] (PHOENIX-2446) Immutable index - Index vs base table row count does not match when index is created during data load

Thomas D'Silva (JIRA) Thu, 14 Jan 2016 12:06:19 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098757#comment-15098757
 ]


Thomas D'Silva edited comment on PHOENIX-2446 at 1/14/16 8:04 PM:
------------------------------------------------------------------

[~jamestaylor]

I have attached a log file which contains logging from HRegionServer and 
HRegion.  I used pherf to load data and created an index while the data was 
being loaded. After the data load was completed there were 500,000 rows in the 
data table, but only 495990 rows in then index table (4010 rows were missing). 
The CREATE INDEX statement returned 0 rows created.

I added a log line to HRegion.doMiniBatchMutation() after the call to 
mvcc.completeMemstoreInsert(w) which advances mvcc and makes the batch visible 
to scanners. 
I also added a log line to HRegionServer.scan() when the server first gets the 
scan to print out the max timestamp of the scan. I added another log line just 
before the server returns the results to the client to get the number of rows 
returned to the client. 
I only included logs for rows written that were are not part of the incremental 
index maintenance (all of these rows should have been picked up by the UPSERT 
SELECT) and for  scans that were part of the UPSERT SELECT.
>From the log, all the scans complete before the call to advance the mvcc in 
>HRegion and so the scans all return 0 rows.

If the UPSERT SELECT runs when there are rows being processed in HRegion before 
the call to advance the mvcc  then the UPSERT SELECT won't be able to see these 
rows.

The total number of rows written is 8020 , which is twice the number of missing 
rows (not sure why its double).


was (Author: tdsilva):
[~jamestaylor]

I have attached a log file which contains logging from HRegionServer and 
HRegion.  I used pherf to load data and created an index while the data was 
being loaded. After the data load was completed there were 500,000 rows in the 
data table, but only 495990 rows in then index table (4010 rows were missing). 
The CREATE INDEX statement returned 0 rows created.

I added a log line to HRegion.doMiniBatchMutation() after the call to 
mvcc.completeMemstoreInsert(w) which advances mvcc and makes the batch visible 
to scanners. 
I also added a log line to HRegionServer.scan() when the server first gets the 
scan to print out the max timestamp of the scan. I added another log line just 
before the server returns the results to the client to get the number of rows 
returned to the client. 
I only included logs for rows written during the initial index population and 
for  scans that were part of the UPSERT SELECT.
>From the log, all the scans complete before the call to advance the mvcc in 
>HRegion and so the scans all return 0 rows.

If the UPSERT SELECT runs when there are rows being process in HRegion before 
the call to advance the mvcc happens then the UPSERT SELECT won't be able to 
see these rows.

The total number of rows written is 8020 , which is twice the number of missing 
rows (not sure why).

> Immutable index - Index vs base table row count does not match when index is 
> created during data load
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2446
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2446
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.6.0
>            Reporter: Mujtaba Chohan
>            Assignee: Thomas D'Silva
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-2446-wip.patch, PHOENIX-2446.patch, server.log
>
>
> I'll add more details later but here's the scenario that consistently 
> produces wrong row count for index table vs base table for immutable async 
> index.
> 1. Start data upsert
> 2. Create async index
> 3. Trigger M/R index build
> 4. Keep data upsert going in background during step 2,3 and a while after M/R 
> index finishes.
> 5. End data upsert. 
> Now count with index enabled vs count with hint to not use index is off by a 
> large factor. Will get a cleaner repro for this issue soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-2446) Immutable index - Index vs base table row count does not match when index is created during data load

Reply via email to