[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

Eran Kutner (JIRA) Thu, 23 Jun 2011 00:28:32 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053681#comment-13053681
 ]


Eran Kutner commented on HBASE-3996:
------------------------------------

Thanks stack.

I hope I finally got Eclipse to properly manage the tabs and line lengths (I'm 
not really a Java developer so this is all new to me).

{quote}In TableSplit you create an HTable instance. Do you need to? And when 
you create it, though I believe it will be less of a problem going forward, can 
you use the constructor that takes a Configuration and table name? Is there a 
close in Split interface? If so, you might want to call close of your HTable in 
there. (Where is it used? Each split needs its own HTable?) Use the constructor 
that takes a Configuration here too...{quote}

There are actually two issues here, I added the configuration and closed the 
table in getSplits(), that's the easy one.
HTable per split is needed because it is used for reading the data from the 
split by the cluster nodes when the job is running. However, in order to 
support passing the configuration, I move the Htable creation out of TableSplit 
and into MutiTableInputFormatBase. I also modified TableRecordReaderImpl to 
close the table after reading all the records in the split. I believe this is 
OK, and the tests are passing fine, but it wasn't like that in the existing, 
single table, implementation so I hope I'm not missing (and messing) anything.

{quote}You don't need the e.printStackTrace in below{quote}
Right, removed and fixed the spelling in the warning.

{quote}By any chance is the code here in MultiTableInputFormatBase where we are 
checking start and end rows copied from elsewhere?{quote}
It's copied from TableInputFormatBase, as I said my code is closely based on 
the single table code.

{quote}You remove the hashCode in TableSplit. Should it have one?{quote}
I actually don't know if it needs one or not (it does seem to work fine without 
it) but I didn't remove it intentionally. I wrote my original code based on the 
0.90.3 branch and when I copied to trunk I missed this change. It's back 
now.{quote}

{quote}therwise patch looks great. Test too.{quote}
Thanks!

Hope that's it.


> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3996
>                 URL: https://issues.apache.org/jira/browse/HBASE-3996
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Eran Kutner
>             Fix For: 0.90.4
>
>         Attachments: MultiTableInputFormat.patch, 
> TestMultiTableInputFormat.java.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

Reply via email to