Nick,

many thanks for the pointer. Yeah, the TableInputFormat looks fit my needs.
I will dig into it. Appreciate the help

Demai

On Wed, Feb 4, 2015 at 8:13 AM, Nick Dimiduk <ndimi...@gmail.com> wrote:

> Sounds like you're wanting to do a lot of what the TableInputFormat
> facilitates for mapreduce programs. Probably you can use code from that
> package to turn a Scan into input splits, which contain region name
> and RegionServer location, and consume those from your custom coordinator.
>
> -n
>
> On Tuesday, February 3, 2015, Demai Ni <nid...@gmail.com> wrote:
>
> > hi, Guys,
> >
> > I am looking for a way to Read HBase table through MPP(Postgres-XC). And
> > hoping to get some suggestions to either validate or invalidate the
> > approach.
> >
> > Kind of like Apache Drill, but through PostgresSQL. Long story about why
> > Postgres, and how c/c++ will give me headache for months to come. :-) I
> > will leave it as is for now.
> >
> > The design is to have distributed Postgres-XC installed on the same HBase
> > cluster, so Postgres' datanodes are on the same physical node as HBase's
> > regionServers. connect HBase from PostgresSQL through existing HBase
> client
> > code.
> >
> > Step1: At Postgres coordinator node(like Master of HBase), use
> > HTable.getRegionLocations to get all Regions of a particular table:
> > NavigableMap<HRegionInfo, ServerName>
> > Step 2: iterate through above NavigatbleMap to map HBase ServerName to
> > PG-XC's dataNode. The goal is to let the dataNode of Postgres handle the
> > regions on its own physical machine.
> > Step 3: Postgres coordinator node send the execution plan to Postgres
> > datanode , through a existing framework called foreign data wrapper.
> > Step 4: Postgres DataNode iterate through its assigned regions, and open
> a
> > HBase Client.Scan() with .setStartRow and .setStopRow so it will only
> read
> > the assigned region.  I was hoping to use HRegionInfo.regionId directly,
> > but can find such API in Client.Scan
> > Step 5: Posgres DataNode further analyse the retrieve data.
> >
> > So in short, the architect design is to leverage Postgres optimizer to
> > parse SQL Query, and use Postgres DataNode as HBase' client to read HBase
> > regions directly in parallel. With the hope to 1) read HRegion locally;
> 2)
> > leverage existing HBase filters.
> >
> > On step4 above, is there a way to talk to RegionSever directly without
> > communicating with HMaster?
> >
> > Similar ideas(Drill for one, how about HP vertica?) are brought up
> before,
> > and discussed.  So before I am heading down the same road, Can I pick
> your
> > brain, please shed me some light? or prevent me from doing something
> > stupid?
> >
> > Many thanks
> >
> > Demai
> >
>

Reply via email to