[
https://issues.apache.org/jira/browse/HBASE-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467577#comment-13467577
]
Jason Dai commented on HBASE-6805:
----------------------------------
[~apurtell] The example in the updated patch file shows a possible example: the
value of each cell stored in the table is automatically encrypted by the CP,
which then needs to decrypt the cell before applying filter operations
(filterKeyValue, transform, etc.). By implementing the filter CP, the
encryption can be transparent to the user code. Similarly, for DOT, multiple
fields are encoded in a single cell by the CP, and each field needs to be
extracted before applying filter operations so that it can be transparent to
the user.
bq. If extending the CP hook model to internal filter methods, we must be
deeply concerned about the costs of iterating CP hook lists during
filtering/scanning. CPs extend the code path, first of all. Then, if hooks are
registered, there will be method invocation and object allocation costs for
_every_ filter operation, twice.
While there are two method invocations for each filter operation, these method
invocations are actually only called for the topmost filter (which
FilterWrapper wraps), not for each filter contained in the chained FilterList
or other composite filters. In our DOT benchmarking, these CP operations are
never the hotspot in scanning.
Having said that, CP operations could become a potential performance issue if
we have a long list of CPs loaded. For instance, database trigger like CPs only
execute upon data mutation (i.e., Put), but are still invoked for
Get/Scan/Filter. One way to address this issue is that, instead of iterating
the global _coprocessor_ set in these pre* & post* operations, the
RegionCoprocessorHost can maintain several CP set, and iterate a different set
in each different CP operation: one for region operations
(preOpen/postOpen/preClose/...), one for update (prePut & postPut), one for
read (preGet/postGet/preScannerOpen/...), and one for filter
(preFilterKeyvalue/postFilterKeyvalue/....); when loading each CP, it can be
registered in appropriate sets (just as endpoints are registered in
_Region.protocolHandlers_).
> Extend co-processor framework to provide observers for filter operations
> ------------------------------------------------------------------------
>
> Key: HBASE-6805
> URL: https://issues.apache.org/jira/browse/HBASE-6805
> Project: HBase
> Issue Type: Sub-task
> Components: Coprocessors
> Affects Versions: 0.96.0
> Reporter: Jason Dai
> Attachments: extend_coprocessor.patch
>
>
> There are several filter operations (e.g., filterKeyValue, filterRow,
> transform, etc.) at the region server side that either exclude KVs from the
> returned results, or transform the returned KV. We need to provide observers
> (e.g., preFilterKeyValue and postFilterKeyValue) for these operations in the
> same way as the observers for other data access operations (e.g., preGet and
> postGet). This extension is needed to support DOT (e.g., extracting
> individual fields from the document in the observers before passing them to
> the related filter operations)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira