[jira] [Commented] (TRAFODION-1729) change the coprocessor deployment method

liu ming (JIRA) Thu, 31 Dec 2015 06:24:08 -0800

    [ 
https://issues.apache.org/jira/browse/TRAFODION-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075963#comment-15075963
 ]


liu ming commented on TRAFODION-1729:
-------------------------------------

thanks Stack very much for your comments. Let me try to clarify items one by 
one.
1. Overloading
To implement transaction semantics, Trafodion save the updated rows in RS heap, 
call it writeOrdering list, all Put and Delete are saved in this list and not 
go into HBase Region until commit, so other transaction cannot see the update. 
So, before commit, when it does a scan, Trafodion needs to filter out those 
'deleted' rows for current transaction, and plus those new put rows which still 
in memory not in the Region yet.
To archieve this, in the coprocessor, Trafodion needs to invoke a protected 
HRegion method:
protected RegionScanner getScanner(Scan scan,
                       List<KeyValueScanner> additionalScanners,
                       boolean copyCellsFromSharedMem)
                            throws IOException
To combine two scanners' result as final scan result. 
The first scanner will equiped with a filter to filter out 'deleted' rows which 
matching the delete objects in writeOrdering list. But new inserted rows or 
updated rows are Put objects in writeOrdering as well, so Trafodion need 
another additonalScanners to get those rows in the writeOrdering list and 
combine the result.
But that getScanner() method is protected, so Trafodion overload HRegion and 
make it public, in order to invoke it.
Currently, Trafodion overload the HRegion just to make that method as public, 
so can be invoked within the coprocessor.
Our proposal is to use Java's reflection technique to invoke this 'proctected' 
method without overload. It is used only once during a scanner construction, so 
we feel it will not impact the performance as well. This is the idea of this 
proposal.

2. Phoenix
I am wrong with Phoenix, I was playing with it long ago and as I remember, I 
just download a tar file and untar it and no change to my HBase system and get 
Phoenix working, so I said that, but probably my memeory is wrong, but the 
point is , we wish to avoid too much changes into HBase and restart it in order 
to use Trafodion.

And I don't know if it is possible to make that getScanner() method to public 
in future HBase release? If that can be made, it will help Trafodion 
development a lot.

thanks.

> change the coprocessor deployment method
> ----------------------------------------
>
>                 Key: TRAFODION-1729
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-1729
>             Project: Apache Trafodion
>          Issue Type: Improvement
>          Components: dtm
>            Reporter: liu ming
>            Assignee: mashengchen
>
> I have a proposal to change our current HBase coprocessor configuration 
> method. 
> There are three ways to add a coprocessor to a HBase table:
> 1.       Via editing hbase-site.xml, which will load coprocessor for ALL 
> tables (Trafodion is using this method now)
> 2.       Via HBase shell command
> 3.       Via HTableDescriptor.addCoprocessor() java API
> Trafodion now is using the first method. I proposed to use method 3, I 
> finished a prototype and test seems works well.
>  
> Here are the reasons I propose for this change:
> At present, the Trafodion installer needs to modify the hbase-site.xml and 
> then restart HBase instance for the configuration to take effect. This step 
> not only complicate the installer but also let user think Trafodion is 
> intrusive into underlying HBase system. It will be ideal if we can avoid this 
> step. Another problem: in CDH, there is a concept called ‘region server 
> group’ or something, so the settings will have to carefully handled by 
> installer to apply to all groups. As we saw recently in WebRoot deployment, 
> Trafodion failed due to this reason. All these are very error prone and 
> complicate the Trafodion installer. Once CDH or HDP changed something, 
> Trafodion may fail again.
>  
> So I spent time to investigate why we need to restart HBase in order to 
> install Trafodion.  
> As I understand, there are 3 major reasons 
> 1.       To add hbase-trx coprocessors
> 2.       To overload HRegion with TransactionalRegion
> 3.       Various configuration settings, need to check one by one.
> The first configuration can be avoided by applying my proposed change.
> The second one, I look through the TransactionalRegion.java, and find out the 
> only reason (now) is to overload the getScanner() method to be public so can 
> be invoked by the coprocessor. And there are only 1 or 2 places that API is 
> invoked in Trafodion code. I checked with Kevin and he proposed by using 
> ‘java reflection’ we can also avoid this. 
> All other configuration items to some extent look like ‘best to have’, but 
> not ‘must to have’. And I also find two config items seems never been used:
> hbase.bulkload.staging.dir     /hbase-staging         (Suresh can confirm, 
> but I search in all code, seems this is never used)
> hbase.regionserver.region.transactional.tlog   true     (Narendra can 
> confirm, this is NEVER used, maybe a legacy config item?)
> Yes, by now, there are still some other config items seems cannot be avoided, 
> but I hope we can find some way to remove them in the future. I am not trying 
> to solve all issues right now, just want to start the effort to remove 
> unnecessary hbase reconfiguration.
> For this example, Coprocessors can be added to a table at run time, no need 
> to edit the hbase-site.xml and restart hbase. This is only the first step to 
> try to remove the deep impact to the current HBase config and restart HBase.
>  
> So I asked for your opinions about this change. If you think this is 
> necessary, I will continue to file a JIRA and fix it. 
>  
> I strongly recommend to get rid of the step of ‘modify hbase-site.xml and 
> restart your hbase’ for Trafodion installation, it should be an option , to 
> tune the system to best suit Trafodion, but should not be a forced step. To 
> be note: Apache Phoenix is also a SQL on HBase, its installation will change 
> nothing of underlying HBase, very lightweight, no ‘intrude into’ the existing 
> HBase system. Trafodion is considered to be heavy and intrusive in this 
> manner, and I feel maybe we can change this.
>  
> Should I start this discussion in the dev mail list?
>  
> P.S. a list of changed config items. My proposal will remove the last one, 
> hope we can get rid of all of them:
> hbase.master.distributed.log.splitting      false
> hbase.snapshot.master.timeoutMillis      600000
> hbase_regionserver_lease_period         600000
> hbase.hregion.impl                     
> org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion
> hbase.regionserver.region.split.policy      
> org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy
> hbase.snapshot.enabled                 true
> hbase.bulkload.staging.dir                /hbase-staging
> hbase.regionserver.region.transactional.tlog  true
> hbase.snapshot.region.timeout            600000
> hbase_coprocessor_region_classes  
> org.apache.hadoop.hbase.coprocessor.transactional.TrxRegionObserver,org.apache.hadoop.hbase.coprocessor.transactional.TrxRegionEndpoint,org.apache.hadoop.hbase.coprocessor.AggregateImplementation
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-1729) change the coprocessor deployment method

Reply via email to