[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format

Christopher Tubbs (JIRA) Mon, 07 Oct 2013 15:20:13 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788643#comment-13788643
 ]


Christopher Tubbs commented on ACCUMULO-391:
--------------------------------------------

A few issues:

* shouldUseLocalIterators was added as a deprecated, public method
** it didn't exist in a prior version, so it shouldn't be deprecated. If it's 
not needed, it should be removed.
** our internal code still uses it, though it's deprecated. We shouldn't use 
our own deprecated code.

* getters changed without deprecation
** setupIterators
** getTabletLocator

* TableQueryConfig was placed in o.a.a.core.conf
** o.a.a.core.conf isn't really part of the public API; it's essentially for 
server-side configuration representation, though we use it internal to some 
client code
** precedent for narrowly scoped config is in the package in which its 
corresponding code exists (see o.a.a.core.client.BatchWriterConfig)

* Javadoc
** Empty @return statements
** Javadocs advise using deprecated methods
** Unnecessary change of variable name "context" in getSplits to "conf", with 
incorrect description based on new name rather than the object type
** Incorrect Javadoc description on setTableQueryConfigs. It is not setting the 
objects on a Hadoop configuration. It is setting the configuration on the job. 
This is a minor thing, but the additional precision in language goes a long way 
towards clear documentation, especially for non-native English speakers and 
people less familiar with the way MapReduce works in Hadoop.

* TableQueryConfig
** For consistency and clarity, this should be named to match our other query 
code. Perhaps instead of "TableQueryConfig", it might be better to call it 
BatchScanConfig, similar to BatchWriterConfig.
** Instead of the ambiguous setTableQueryConfigs, perhaps a method called 
Map<String, BatchScanConfig>, to make this object more re-usable.
** getter should be protected

Overall, given the significance of the changes to the API of the MapReduce 
code, for a limited application (most existing users of this class will 
probably continue to only scan one table at a time), I think it'd be better to 
put this code in a separate InputFormat class. This should be especially 
concerning because we've recently just stabilized the M/R code in 1.5.0, 
ironing out existing issues, and this is a bit too disruptive (deprecating 
brand new methods in 1.5, like setTableName and setRanges, for instance).

Another reason this might be good as a separate class... is that we could 
actually have the existing single table version extend the multi-table version 
as a specialized case. That would save on maintenance costs of two 
implementations, but leave the existing stable and familiar API in tact until 
the new one is proven and stable. And then (if we really wanted to) we could 
deprecate the single table version as a whole, rather than deprecating half of 
it, and trying to maintain a half-deprecated half-current class.

> Multi-table Accumulo input format
> ---------------------------------
>
>                 Key: ACCUMULO-391
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-391
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: John Vines
>            Assignee: Corey J. Nolet
>            Priority: Minor
>              Labels: mapreduce,
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-391.patch, multi-table-if.patch, 
> new-multitable-if.patch
>
>
> Just realized we had no MR input method which supports multiple Tables for an 
> input format. I would see it making the table the mapper's key and making the 
> Key/Value a tuple, or alternatively have the Table/Key be the key tuple and 
> stick with Values being the value.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format

Reply via email to