[
https://issues.apache.org/jira/browse/HBASE-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589972#comment-13589972
]
Nick Dimiduk commented on HBASE-4587:
-------------------------------------
bq. "Being able to have multiple tables as your input path"
bq. "Being able to filter on specific columns/column families".
{{MultiTableInputFormat}} provides both of these requests.
bq. Providing the source location (table/row/column) to the results
The {{Result}} instances provided to the mapper satisfy row and column. Table
would be an addition. Perhaps {{map.input.file}} can be used to deliver the
table name?
bq. Multiple clusters
This is tricky as it amounts to setting multiple conf objects for the job. From
the client perspective, it could be passed in as a {{List<Configuration>}} in
the same way {{initTableMapper}} already accepts a {{List<Scan>}}. Do you have
any ideas on implementation?
bq. Different schemas
This request doesn't make sense to me. How do you mean?
> HBase MR support for multiple tables as input
> ---------------------------------------------
>
> Key: HBASE-4587
> URL: https://issues.apache.org/jira/browse/HBASE-4587
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.90.3
> Reporter: Rajeev Rao
>
> Some requirements:
> - Being able to have multiple tables as your input path
> - Being able to filter on specific columns/column families
> - Providing the source location (table/row/column) to the results
> - Multiple clusters
> - Different schemas.
> Overall this seems difficult for now so I am going to punt on it. On the
> other hand it would be easy enough to write all of the MR values into an
> intermediate table and then work from there.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira