+1 for above. I see that there is HbaseGetOperator but but its abstract no concrete implementation of this I can find. Are you going to implement of that too?
Maybe the concrete implementation of HbaseGetOperator should have this. Also, I want to mention one thing about scan from my previous experience of Hbase. The Hbase client is synchronous. This means when you fire a scan call, until certain number of records are received at client end, the function blocks. This causes a lot of problems in the current thread as it might just get blocked for a long period of time. Plus, there are always network related latency to add to the problem. Usually the way to deal with this is to fire scan like queries on a separate thread and then consume the results in the main thread. Please take care of this scenario while implementation of scan operator. -Chinmay. ~ Chinmay. On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <[email protected]> wrote: > +1 for this Bhupesh. > > Additionally, I would suggest to add support for; > 1. Point query > 2. Returning any row version > > The above two are key features of HBase and should be supported. > > Regards, > Sandeep > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <[email protected]> > wrote: > > > Hi All, > > > > The current HBasePOJOInputOperator does not allow us to do the following: > > > > 1. Allow us to specify a set of "column family: column" and fetch data > > only for these columns. > > 2. Output format is currently a POJO. We need to have other output > > formats such that "columnFamily:column" representation is supported. > > Map / > > CSV are some of the options. > > 3. Allow specifying "end row-key" to stop scanning a table. > > 4. No metrics. > > > > I am planning to add the above functionality to the HBase Input > operators. > > These features may go into the HBaseScanOperator / > HBasePOJOInputOperator. > > > > Please let me know your comments. > > > > Thanks. > > > > Bhupesh > > >
