[
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053681#comment-13053681
]
Eran Kutner commented on HBASE-3996:
------------------------------------
Thanks stack.
I hope I finally got Eclipse to properly manage the tabs and line lengths (I'm
not really a Java developer so this is all new to me).
{quote}In TableSplit you create an HTable instance. Do you need to? And when
you create it, though I believe it will be less of a problem going forward, can
you use the constructor that takes a Configuration and table name? Is there a
close in Split interface? If so, you might want to call close of your HTable in
there. (Where is it used? Each split needs its own HTable?) Use the constructor
that takes a Configuration here too...{quote}
There are actually two issues here, I added the configuration and closed the
table in getSplits(), that's the easy one.
HTable per split is needed because it is used for reading the data from the
split by the cluster nodes when the job is running. However, in order to
support passing the configuration, I move the Htable creation out of TableSplit
and into MutiTableInputFormatBase. I also modified TableRecordReaderImpl to
close the table after reading all the records in the split. I believe this is
OK, and the tests are passing fine, but it wasn't like that in the existing,
single table, implementation so I hope I'm not missing (and messing) anything.
{quote}You don't need the e.printStackTrace in below{quote}
Right, removed and fixed the spelling in the warning.
{quote}By any chance is the code here in MultiTableInputFormatBase where we are
checking start and end rows copied from elsewhere?{quote}
It's copied from TableInputFormatBase, as I said my code is closely based on
the single table code.
{quote}You remove the hashCode in TableSplit. Should it have one?{quote}
I actually don't know if it needs one or not (it does seem to work fine without
it) but I didn't remove it intentionally. I wrote my original code based on the
0.90.3 branch and when I copied to trunk I missed this change. It's back
now.{quote}
{quote}therwise patch looks great. Test too.{quote}
Thanks!
Hope that's it.
> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> ------------------------------------------------------------------------------
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Eran Kutner
> Fix For: 0.90.4
>
> Attachments: MultiTableInputFormat.patch,
> TestMultiTableInputFormat.java.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple
> scanners on a single table can save a lot of time when running map/reduce
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira