The class HBaseInputOperator seems to be quite old. HBaseStore seems to be having all the functionality provided by HBaseInputOperator and even more (including Kerberos authentication).
It would be a good idea to avoid the usage of HBaseInputOperator going forward and use HBaseStore instead. I will also work on abstracting out the HBase input functionality in the HBaseInputOperator, which can be extended by concrete implementations. -Bhupesh On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <[email protected]> wrote: > Thanks for the inputs. > As an input operator, I am targeting just the Scan operation. Get > operation may be supported better as a generic operator (like a query > operator) which I can take up later. > > -Bhupesh > > On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <[email protected]> > wrote: > >> +1 >> >> Regards, >> Mohit >> >> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar < >> [email protected] >> > wrote: >> >> > +1 for above. >> > I see that there is HbaseGetOperator but but its abstract no concrete >> > implementation of this I can find. >> > Are you going to implement of that too? >> > >> > Maybe the concrete implementation of HbaseGetOperator should have this. >> > >> > Also, I want to mention one thing about scan from my previous >> experience of >> > Hbase. The Hbase client is synchronous. >> > This means when you fire a scan call, until certain number of records >> are >> > received at client end, the function blocks. >> > This causes a lot of problems in the current thread as it might just get >> > blocked for a long period of time. >> > Plus, there are always network related latency to add to the problem. >> > >> > Usually the way to deal with this is to fire scan like queries on a >> > separate thread and then consume the results in the main thread. >> > >> > Please take care of this scenario while implementation of scan operator. >> > >> > -Chinmay. >> > >> > >> > ~ Chinmay. >> > >> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh < >> > [email protected]> >> > wrote: >> > >> > > +1 for this Bhupesh. >> > > >> > > Additionally, I would suggest to add support for; >> > > 1. Point query >> > > 2. Returning any row version >> > > >> > > The above two are key features of HBase and should be supported. >> > > >> > > Regards, >> > > Sandeep >> > > >> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda < >> [email protected] >> > > >> > > wrote: >> > > >> > > > Hi All, >> > > > >> > > > The current HBasePOJOInputOperator does not allow us to do the >> > following: >> > > > >> > > > 1. Allow us to specify a set of "column family: column" and fetch >> > data >> > > > only for these columns. >> > > > 2. Output format is currently a POJO. We need to have other >> output >> > > > formats such that "columnFamily:column" representation is >> supported. >> > > > Map / >> > > > CSV are some of the options. >> > > > 3. Allow specifying "end row-key" to stop scanning a table. >> > > > 4. No metrics. >> > > > >> > > > I am planning to add the above functionality to the HBase Input >> > > operators. >> > > > These features may go into the HBaseScanOperator / >> > > HBasePOJOInputOperator. >> > > > >> > > > Please let me know your comments. >> > > > >> > > > Thanks. >> > > > >> > > > Bhupesh >> > > > >> > > >> > >> > >
