Re: HBase 0.98 addon for Flink 0.8

Fabian Hueske Mon, 03 Nov 2014 02:51:48 -0800

Hi Flavio

let me try to answer your last question on the user's list (to the best of
my HBase knowledge).
"I just wanted to known if and how regiom splitting is handled. Can you
explain me in detail how Flink and HBase works?what is not fully clear to
me is when computation is done by region servers and when data start flow
to a Flink worker (that in ky test job is only my pc) and how ro undertsand
better the important logged info to understand if my job is performing well"


HBase partitions its tables into so called "regions" of keys and stores the
regions distributed in the cluster using HDFS. I think an HBase region can
be thought of as a HDFS block. To make reading an HBase table efficient,
region reads should be locally done, i.e., an InputFormat should primarily
read region that are stored on the same machine as the IF is running on.
Flink's InputSplits partition the HBase input by regions and add
information about the storage location of the region. During execution,
input splits are assigned to InputFormats that can do local reads.

Best, Fabian

2014-11-03 11:13 GMT+01:00 Stephan Ewen <[email protected]>:

> Hi!
>
> The way of passing parameters through the configuration is very old (the
> original HBase format dated back to that time). I would simply make the
> HBase format take those parameters through the constructor.
>
> Greetings,
> Stephan
>
>
> On Mon, Nov 3, 2014 at 10:59 AM, Flavio Pompermaier <[email protected]>
> wrote:
>
> > The problem is that I also removed the GenericTableOutputFormat because
> > there is an incompatibility between hadoop1 and hadoop2 for class
> > TaskAttemptContext and TaskAttemptContextImpl..
> > then it would be nice if the user doesn't have to worry about passing
> > pact.hbase.jtkey and pact.job.id parameters..
> > I think it is probably a good idea to remove hadoop1 compatibility and
> keep
> > enable HBase addon only for hadoop2 (as before) and decide how to mange
> > those 2 parameters..
> >
> > On Mon, Nov 3, 2014 at 10:19 AM, Stephan Ewen <[email protected]> wrote:
> >
> > > It is fine to remove it, in my opinion.
> > >
> > > On Mon, Nov 3, 2014 at 10:11 AM, Flavio Pompermaier <
> > [email protected]>
> > > wrote:
> > >
> > > > That is one class I removed because it was using the deprecated API
> > > > GenericDataSink..I can restore them but the it will be a good idea to
> > > > remove those warning (also because from what I understood the Record
> > APIs
> > > > are going to be removed).
> > > >
> > > > On Mon, Nov 3, 2014 at 9:51 AM, Fabian Hueske <[email protected]>
> > > wrote:
> > > >
> > > > > I'm not familiar with the HBase connector code, but are you maybe
> > > looking
> > > > > for the GenericTableOutputFormat?
> > > > >
> > > > > 2014-11-03 9:44 GMT+01:00 Flavio Pompermaier <[email protected]
> >:
> > > > >
> > > > > > | was trying to modify the example setting hbaseDs.output(new
> > > > > > HBaseOutputFormat()); but I can't see any HBaseOutputFormat
> > > > class..maybe
> > > > > we
> > > > > > shall use another class?
> > > > > >
> > > > > > On Mon, Nov 3, 2014 at 9:39 AM, Flavio Pompermaier <
> > > > [email protected]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Maybe that's something I could add to the HBase example and
> that
> > > > could
> > > > > be
> > > > > > > better documented in the Wiki.
> > > > > > >
> > > > > > > Since we're talking about the wiki..I was looking at the Java
> > API (
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://flink.incubator.apache.org/docs/0.6-incubating/java_api_guide.html)
> > > > > > > and the link to the KMeans example is not working (where it
> says
> > > For
> > > > a
> > > > > > > complete example program, have a look at KMeans Algorithm).
> > > > > > >
> > > > > > > Best,
> > > > > > > Flavio
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 3, 2014 at 9:12 AM, Flavio Pompermaier <
> > > > > [email protected]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Ah ok, perfect! That was the reason why I removed it :)
> > > > > > >>
> > > > > > >> On Mon, Nov 3, 2014 at 9:10 AM, Stephan Ewen <
> [email protected]>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> You do not really need a HBase data sink. You can call
> > > > > > >>> "DataSet.output(new
> > > > > > >>> HBaseOutputFormat())"
> > > > > > >>>
> > > > > > >>> Stephan
> > > > > > >>> Am 02.11.2014 23:05 schrieb "Flavio Pompermaier" <
> > > > > [email protected]
> > > > > > >:
> > > > > > >>>
> > > > > > >>> > Just one last thing..I removed the HbaseDataSink because I
> > > think
> > > > it
> > > > > > was
> > > > > > >>> > using the old APIs..can someone help me in updating that
> > class?
> > > > > > >>> >
> > > > > > >>> > On Sun, Nov 2, 2014 at 10:55 AM, Flavio Pompermaier <
> > > > > > >>> [email protected]>
> > > > > > >>> > wrote:
> > > > > > >>> >
> > > > > > >>> > > Indeed this time the build has been successful :)
> > > > > > >>> > >
> > > > > > >>> > > On Sun, Nov 2, 2014 at 10:29 AM, Fabian Hueske <
> > > > > [email protected]
> > > > > > >
> > > > > > >>> > wrote:
> > > > > > >>> > >
> > > > > > >>> > >> You can also setup Travis to build your own Github
> > > > repositories
> > > > > by
> > > > > > >>> > linking
> > > > > > >>> > >> it to your Github account. That way Travis can build all
> > > your
> > > > > > >>> branches
> > > > > > >>> > >> (and
> > > > > > >>> > >> you can also trigger rebuilds if something fails).
> > > > > > >>> > >> Not sure if we can manually trigger retrigger builds on
> > the
> > > > > Apache
> > > > > > >>> > >> repository.
> > > > > > >>> > >>
> > > > > > >>> > >> Support for Hadoop 1 and 2 is indeed a very good
> addition
> > > :-)
> > > > > > >>> > >>
> > > > > > >>> > >> For the discusion about the PR itself, I would need a
> bit
> > > more
> > > > > > time
> > > > > > >>> to
> > > > > > >>> > >> become more familiar with HBase. I do also not have a
> > HBase
> > > > > setup
> > > > > > >>> > >> available
> > > > > > >>> > >> here.
> > > > > > >>> > >> Maybe somebody else of the community who was involved
> > with a
> > > > > > >>> previous
> > > > > > >>> > >> version of the HBase connector could comment on your
> > > question.
> > > > > > >>> > >>
> > > > > > >>> > >> Best, Fabian
> > > > > > >>> > >>
> > > > > > >>> > >> 2014-11-02 9:57 GMT+01:00 Flavio Pompermaier <
> > > > > > [email protected]
> > > > > > >>> >:
> > > > > > >>> > >>
> > > > > > >>> > >> > As suggestes by Fabian I moved the discussion on this
> > > > mailing
> > > > > > >>> list.
> > > > > > >>> > >> >
> > > > > > >>> > >> > I think that what is still to be discussed is how  to
> > > > > retrigger
> > > > > > >>> the
> > > > > > >>> > >> build
> > > > > > >>> > >> > on Travis (I don't have an account) and if the PR can
> be
> > > > > > >>> integrated.
> > > > > > >>> > >> >
> > > > > > >>> > >> > Maybe what I can do is to move the HBase example in
> the
> > > test
> > > > > > >>> package
> > > > > > >>> > >> (right
> > > > > > >>> > >> > now I left it in the main folder) so it will force
> > Travis
> > > to
> > > > > > >>> rebuild.
> > > > > > >>> > >> > I'll do it within a couple of hours.
> > > > > > >>> > >> >
> > > > > > >>> > >> > Another thing I forgot to say is that the hbase
> > extension
> > > is
> > > > > now
> > > > > > >>> > >> compatible
> > > > > > >>> > >> > with both hadoop 1 and 2.
> > > > > > >>> > >> >
> > > > > > >>> > >> > Best,
> > > > > > >>> > >> > Flavio
> > > > > > >>> > >>
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase 0.98 addon for Flink 0.8

Reply via email to