[ 
https://issues.apache.org/jira/browse/HBASE-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467577#comment-13467577
 ] 

Jason Dai commented on HBASE-6805:
----------------------------------

[~apurtell] The example in the updated patch file shows a possible example: the 
value of each cell stored in the table is automatically encrypted by the CP, 
which then needs to decrypt the cell before applying filter operations 
(filterKeyValue, transform, etc.). By implementing the filter CP, the 
encryption can be transparent to the user code. Similarly, for DOT, multiple 
fields are encoded in a single cell by the CP, and each field needs to be 
extracted before applying filter operations so that it can be transparent to 
the user.

bq. If extending the CP hook model to internal filter methods, we must be 
deeply concerned about the costs of iterating CP hook lists during 
filtering/scanning. CPs extend the code path, first of all. Then, if hooks are 
registered, there will be method invocation and object allocation costs for 
_every_ filter operation, twice.

While there are two method invocations for each filter operation, these method 
invocations are actually only called for the topmost filter (which 
FilterWrapper wraps), not for each filter contained in the chained FilterList 
or other composite filters. In our DOT benchmarking, these CP operations are 
never the hotspot in scanning.

Having said that, CP operations could become a potential performance issue if 
we have a long list of CPs loaded. For instance, database trigger like CPs only 
execute upon data mutation (i.e., Put), but are still invoked for 
Get/Scan/Filter. One way to address this issue is that, instead of iterating 
the global _coprocessor_ set in these pre* & post* operations, the 
RegionCoprocessorHost can maintain several CP set, and iterate a different set 
in each different CP operation: one for region operations 
(preOpen/postOpen/preClose/...), one for update (prePut & postPut), one for 
read (preGet/postGet/preScannerOpen/...), and one for filter 
(preFilterKeyvalue/postFilterKeyvalue/....); when loading each CP, it can be 
registered in appropriate sets (just as endpoints are registered in 
_Region.protocolHandlers_).
                
> Extend co-processor framework to provide observers for filter operations
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6805
>                 URL: https://issues.apache.org/jira/browse/HBASE-6805
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Coprocessors
>    Affects Versions: 0.96.0
>            Reporter: Jason Dai
>         Attachments: extend_coprocessor.patch
>
>
> There are several filter operations (e.g., filterKeyValue, filterRow, 
> transform, etc.) at the region server side that either exclude KVs from the 
> returned results, or transform the returned KV. We need to provide observers 
> (e.g., preFilterKeyValue and postFilterKeyValue) for these operations in the 
> same way as the observers for other data access operations (e.g., preGet and 
> postGet). This extension is needed to support DOT (e.g., extracting 
> individual fields from the document in the observers before passing them to 
> the related filter operations) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to