[jira] [Updated] (PHOENIX-2995) Write performance severely degrades with large number of views

Thomas D'Silva (JIRA) Mon, 15 Aug 2016 12:00:06 -0700

     [ 
https://issues.apache.org/jira/browse/PHOENIX-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas D'Silva updated PHOENIX-2995:
------------------------------------
    Attachment: PHOENIX-2995-v4.patch

[~jamestaylor] [~samarthjain]

Thanks for the review. I attached a patch without white spaces changes.

I moved new PhoenixConnection() out of the synchronization block. Previously we 
always looked up the physical table in the shared cache assuming it would be 
present. However if all tables don't fit in the cache, it might not be present. 
I found this out in one of my tests that creates views using multiple base 
tables with a smaller cache size. 

The code in CreateTableCompiler is creating a new connection holding the scn at 
which the base table was resolved. If we don't do a +1 we wont be able to see 
the table in the cache (as our prune tables method would remove it). There was 
a bug previously where we were not pruning the tables which is why it was 
working before.

I made the changes you suggested to TableInfo. 

The PhoenixConnection changes are needed because we do not want to prune global 
tables in a tenant specific connection (in this case table.getTenantId() is 
null and tenantId is not null). 

Its not possible to prune and clone in a single step now that I have moved the 
clone into the synchronization block and the new PhoenixConnection out of the 
sycnhronization block. 

I reverted FormatToBytesWritableMapper changes. 

We only need to synchronize while accessing the shared cache of the 
ConnectionQueryServices and not the PhoenixConnection (since those aren't 
shared). 

Samarth had suggested using a ReadWriteLock instead of synchronizing on the 
latestMetaDataLock object and using the read lock in the connect method and 
write lock everywhere else .

In the metaDataMutated() we do a  {code} latestMetaDataLock.wait(waitTime); 
{code} 
Should I just use the wait on the writeLock if I use a ReadWriteLock ?




> Write performance severely degrades with large number of views 
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-2995
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2995
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Mujtaba Chohan
>            Assignee: Thomas D'Silva
>              Labels: Argus
>             Fix For: 4.8.1
>
>         Attachments: PHOENIX-2995-v2.patch, PHOENIX-2995-v3.patch, 
> PHOENIX-2995-v4.patch, PHOENIX-2995.patch, create_view_and_upsert.png, 
> image.png, image2.png, image3.png, upsert_rate.png
>
>
> Write performance for each 1K batch degrades significantly when there are 
> *10K* views being written in random with default 
> {{phoenix.client.maxMetaDataCacheSize}}. With all views created, upsert rate 
> remains around 25 seconds per 1K batch i.e. ~2K rows/min upsert rate. 
> When {{phoenix.client.maxMetaDataCacheSize}} is increased to 100MB+ then view 
> does not need to get re-resolved and upsert rate gets back to normal ~60K 
> rows/min.
> With *100K* views and {{phoenix.client.maxMetaDataCacheSize}} set to 1GB, I 
> wasn't able create all 100K views as upsert time for each 1K batch keeps on 
> steadily increasing. 
> Following graph shows 1K batch upsert rate over time with variation of number 
> of views. Rows are upserted to random views {{CREATE VIEW IF NOT EXISTS ... 
> APPEND_ONLY_SCHEMA = true, UPDATE_CACHE_FREQUENCY=900000}} is executed before 
> upsert statement.
> !upsert_rate.png!
> Base table is also created with {{APPEND_ONLY_SCHEMA = true, 
> UPDATE_CACHE_FREQUENCY = 900000, AUTO_PARTITION_SEQ}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PHOENIX-2995) Write performance severely degrades with large number of views

Reply via email to