[
https://issues.apache.org/jira/browse/ACCUMULO-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452801#comment-13452801
]
Christopher Tubbs commented on ACCUMULO-759:
--------------------------------------------
I do like the ability to chain added by returning a Scanner, though I still
prefer "append" over "add" due to the tendency for "add" to get overloaded and
confusing. Also, if "append" is the behavior for scan-time iterators, without
the priority, then the term "scan" can be dropped from the method. So,
"appendScanIterator(ScanIteratorSetting)" becomes
"scanner.appendIterator(IteratorSetting)".
Also, the boolean seems to achieve the same as the convention of <1024 vs.
>=1024 (scan iterators would just start at 1024, and +1 for each successive
iterator appended). However, the boolean is more restrictive than this, because
it prevents insertion of an iterator at other points in the scan. So, I guess
it comes down to whether or not the current behavior should be modified in this
restrictive way. Personally, I think it shouldn't be. Consider two use cases:
TableA is configured with a per-table iterator that groups and displays rows as
JSON upon query. A query framework is built on this table that allows users to
filter out particular columns from each row at scan time (relational algebra
projection). However, the view will always be JSON. It seems reasonable to set
a per-table iterator that converts rows to JSON at priority 500, and at
scan-time, inject the filtering iterator at priority 400.
Now, this is a trivial example, where users are constrained to a particular
view that could just as easily be added at scan time. However, consider the use
case where an iterator is applied to a table to enforce a view policy that is
intended to protect patient privacy or enforce a DRM scheme on multimedia
content. Such an iterator may allow lower-priority filters, but could only show
counts of the matching results. Alternatively, if such an iterator is given the
proper payment method, it could encode the data with a DRM scheme to lease the
queried content to a subscriber for some requested period of time.
These are just a few examples of why I think it would be too constraining to
only allow appending scan-time iterators and not allow injecting them at a
lower priority.
> remove priority setting for scan-time iterators
> -----------------------------------------------
>
> Key: ACCUMULO-759
> URL: https://issues.apache.org/jira/browse/ACCUMULO-759
> Project: Accumulo
> Issue Type: Improvement
> Reporter: Adam Fuchs
> Labels: newbie
>
> Iterators have a priority setting that allows a user to order iterators
> arbitrarily. However that priority is an integer that doesn't directly convey
> the iterator's relationship to other iterators. I would postulate that nobody
> has ever needed to sneak in a scan-time iterator underneath a configured
> table iterator (please let me know if I'm wrong about this), and the effect
> of doing so is not easy to calculate. Many people have chosen a bad iterator
> priority and seen commutativity problems with previously configured iterators.
> I propose that we use more of an agglomerative approach to configuring
> scan-time iterators, in which the order of the iterator tree is the same
> order in which the addScanIterator method is called, and all scan-time
> iterators apply after the configured iterators apply. The change to the API
> should just be to remove the priority number, and the existing
> IteratorSetting constructor and accessors should be deprecated.
> With this change, we can think of an iterator as more of a functional
> modification to a data set, as in T' = f(T) or T'' = g(f(T)). This should
> make it easier for developers to use iterators correctly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira