[
https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788643#comment-13788643
]
Christopher Tubbs commented on ACCUMULO-391:
--------------------------------------------
A few issues:
* shouldUseLocalIterators was added as a deprecated, public method
** it didn't exist in a prior version, so it shouldn't be deprecated. If it's
not needed, it should be removed.
** our internal code still uses it, though it's deprecated. We shouldn't use
our own deprecated code.
* getters changed without deprecation
** setupIterators
** getTabletLocator
* TableQueryConfig was placed in o.a.a.core.conf
** o.a.a.core.conf isn't really part of the public API; it's essentially for
server-side configuration representation, though we use it internal to some
client code
** precedent for narrowly scoped config is in the package in which its
corresponding code exists (see o.a.a.core.client.BatchWriterConfig)
* Javadoc
** Empty @return statements
** Javadocs advise using deprecated methods
** Unnecessary change of variable name "context" in getSplits to "conf", with
incorrect description based on new name rather than the object type
** Incorrect Javadoc description on setTableQueryConfigs. It is not setting the
objects on a Hadoop configuration. It is setting the configuration on the job.
This is a minor thing, but the additional precision in language goes a long way
towards clear documentation, especially for non-native English speakers and
people less familiar with the way MapReduce works in Hadoop.
* TableQueryConfig
** For consistency and clarity, this should be named to match our other query
code. Perhaps instead of "TableQueryConfig", it might be better to call it
BatchScanConfig, similar to BatchWriterConfig.
** Instead of the ambiguous setTableQueryConfigs, perhaps a method called
Map<String, BatchScanConfig>, to make this object more re-usable.
** getter should be protected
Overall, given the significance of the changes to the API of the MapReduce
code, for a limited application (most existing users of this class will
probably continue to only scan one table at a time), I think it'd be better to
put this code in a separate InputFormat class. This should be especially
concerning because we've recently just stabilized the M/R code in 1.5.0,
ironing out existing issues, and this is a bit too disruptive (deprecating
brand new methods in 1.5, like setTableName and setRanges, for instance).
Another reason this might be good as a separate class... is that we could
actually have the existing single table version extend the multi-table version
as a specialized case. That would save on maintenance costs of two
implementations, but leave the existing stable and familiar API in tact until
the new one is proven and stable. And then (if we really wanted to) we could
deprecate the single table version as a whole, rather than deprecating half of
it, and trying to maintain a half-deprecated half-current class.
> Multi-table Accumulo input format
> ---------------------------------
>
> Key: ACCUMULO-391
> URL: https://issues.apache.org/jira/browse/ACCUMULO-391
> Project: Accumulo
> Issue Type: New Feature
> Reporter: John Vines
> Assignee: Corey J. Nolet
> Priority: Minor
> Labels: mapreduce,
> Fix For: 1.6.0
>
> Attachments: ACCUMULO-391.patch, multi-table-if.patch,
> new-multitable-if.patch
>
>
> Just realized we had no MR input method which supports multiple Tables for an
> input format. I would see it making the table the mapper's key and making the
> Key/Value a tuple, or alternatively have the Table/Key be the key tuple and
> stick with Values being the value.
--
This message was sent by Atlassian JIRA
(v6.1#6144)