Does the HBase jar in the lib folder contain a config that could be used 
instead of the config in the job jar file? Or is simply no config at all 
available when the configure method is called?






--
Fabian Hueske
Phone:      +49 170 5549438
Email:      [email protected]
Web:         http://www.user.tu-berlin.de/fabian.hueske





From: Flavio Pompermaier
Sent: ‎Thursday‎, ‎13‎. ‎November‎, ‎2014 ‎21‎:‎43
To: [email protected]





The hbase jar is in the lib directory on each node while the config files
are within the jar file I submit from the web client.
On Nov 13, 2014 9:37 PM, <[email protected]> wrote:

> Have you added the hbase.jar file with your HBase config to the ./lib
> folders of your Flink setup (JobManager, TaskManager) or is it bundled with
> your job.jar file?
>
>
>
>
>
> --
> Fabian Hueske
> Phone:      +49 170 5549438
> Email:      [email protected]
> Web:         http://www.user.tu-berlin.de/fabian.hueske
>
>
>
>
>
> From: Flavio Pompermaier
> Sent: ‎Thursday‎, ‎13‎. ‎November‎, ‎2014 ‎18‎:‎36
> To: [email protected]
>
>
>
>
>
> Any help with this? :(
>
> On Thu, Nov 13, 2014 at 2:06 PM, Flavio Pompermaier <[email protected]>
> wrote:
>
> > We definitely discovered that instantiating HTable and Scan in
> configure()
> > method of TableInputFormat causes problem in distributed environment!
> > If you look at my implementation at
> >
> https://github.com/fpompermaier/incubator-flink/blob/master/flink-addons/flink-hbase/src/main/java/org/apache/flink/addons/hbase/TableInputFormat.java
> > you can see that Scan and HTable were made transient and recreated within
> > configure but this causes HBaseConfiguration.create() to fail searching
> for
> > classpath files...could you help us understanding why?
> >
> > On Wed, Nov 12, 2014 at 8:10 PM, Flavio Pompermaier <
> [email protected]>
> > wrote:
> >
> >> Usually, when I run a mapreduce job both on Spark and Hadoop I just put
> >> *-site.xml files into the war I submit to the cluster and that's it. I
> >> think the problem appeared when I made the HTable a private transient
> field
> >> and the table istantiation was moved in the configure method.
> >> Could it be a valid reason? we still have to make a deeper debug but I'm
> >> trying ro figure out where to investigate..
> >> On Nov 12, 2014 8:03 PM, "Robert Metzger" <[email protected]> wrote:
> >>
> >>> Hi,
> >>> Maybe its an issue with the classpath? As far as I know is Hadoop
> reading
> >>> the configuration files from the classpath. Maybe is the hbase-site.xml
> >>> file not accessible through the classpath when running on the cluster?
> >>>
> >>> On Wed, Nov 12, 2014 at 7:40 PM, Flavio Pompermaier <
> >>> [email protected]>
> >>> wrote:
> >>>
> >>> > Today we tried tp execute a job on the cluster instead of on local
> >>> executor
> >>> > and we faced the problem that the hbase-site.xml was basically
> >>> ignored. Is
> >>> > there a reason why the TableInputFormat is working correctly on local
> >>> > environment while it doesn't on a cluster?
> >>> > On Nov 10, 2014 10:56 AM, "Fabian Hueske" <[email protected]>
> wrote:
> >>> >
> >>> > > I don't think we need to bundle the HBase input and output format
> in
> >>> a
> >>> > > single PR.
> >>> > > So, I think we can proceed with the IF only and target the OF
> later.
> >>> > > However, the fix for Kryo should be in the master before merging
> the
> >>> PR.
> >>> > > Till is currently working on that and said he expects this to be
> >>> done by
> >>> > > end of the week.
> >>> > >
> >>> > > Cheers, Fabian
> >>> > >
> >>> > >
> >>> > > 2014-11-07 12:49 GMT+01:00 Flavio Pompermaier <
> [email protected]
> >>> >:
> >>> > >
> >>> > > > I fixed also the profile for Cloudera CDH5.1.3. You can build it
> >>> with
> >>> > the
> >>> > > > command:
> >>> > > >       mvn clean install -Dmaven.test.skip=true -Dhadoop.profile=2
> >>> > > >  -Pvendor-repos,cdh5.1.3
> >>> > > >
> >>> > > > However, it would be good to generate the specific jar when
> >>> > > > releasing..(e.g.
> >>> > > > flink-addons:flink-hbase:0.8.0-hadoop2-cdh5.1.3-incubating)
> >>> > > >
> >>> > > > Best,
> >>> > > > Flavio
> >>> > > >
> >>> > > > On Fri, Nov 7, 2014 at 12:44 PM, Flavio Pompermaier <
> >>> > > [email protected]>
> >>> > > > wrote:
> >>> > > >
> >>> > > > > I've just updated the code on my fork (synch with current
> master
> >>> and
> >>> > > > > applied improvements coming from comments on related PR).
> >>> > > > > I still have to understand how to write results back to an
> HBase
> >>> > > > > Sink/OutputFormat...
> >>> > > > >
> >>> > > > >
> >>> > > > > On Mon, Nov 3, 2014 at 12:05 PM, Flavio Pompermaier <
> >>> > > > [email protected]>
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > >> Thanks for the detailed answer. So if I run a job from my
> >>> machine
> >>> > I'll
> >>> > > > >> have to download all the scanned data in a table..right?
> >>> > > > >>
> >>> > > > >> Always regarding the GenericTableOutputFormat it is not clear
> >>> to me
> >>> > > how
> >>> > > > >> to proceed..
> >>> > > > >> I saw in the hadoop compatibility addon that it is possible to
> >>> have
> >>> > > such
> >>> > > > >> compatibility using HBaseUtils class so the open method should
> >>> > become
> >>> > > > >> something like:
> >>> > > > >>
> >>> > > > >> @Override
> >>> > > > >> public void open(int taskNumber, int numTasks) throws
> >>> IOException {
> >>> > > > >> if (Integer.toString(taskNumber + 1).length() > 6) {
> >>> > > > >> throw new IOException("Task id too large.");
> >>> > > > >> }
> >>> > > > >> TaskAttemptID taskAttemptID =
> >>> > TaskAttemptID.forName("attempt__0000_r_"
> >>> > > > >> + String.format("%" + (6 - Integer.toString(taskNumber +
> >>> > 1).length())
> >>> > > +
> >>> > > > >> "s"," ").replace(" ", "0")
> >>> > > > >> + Integer.toString(taskNumber + 1)
> >>> > > > >> + "_0");
> >>> > > > >>  this.configuration.set("mapred.task.id",
> >>> > taskAttemptID.toString());
> >>> > > > >> this.configuration.setInt("mapred.task.partition", taskNumber
> +
> >>> 1);
> >>> > > > >> // for hadoop 2.2
> >>> > > > >> this.configuration.set("mapreduce.task.attempt.id",
> >>> > > > >> taskAttemptID.toString());
> >>> > > > >> this.configuration.setInt("mapreduce.task.partition",
> >>> taskNumber +
> >>> > 1);
> >>> > > > >>  try {
> >>> > > > >> this.context =
> >>> > > > >> HadoopUtils.instantiateTaskAttemptContext(this.configuration,
> >>> > > > >> taskAttemptID);
> >>> > > > >> } catch (Exception e) {
> >>> > > > >> throw new RuntimeException(e);
> >>> > > > >> }
> >>> > > > >> final HFileOutputFormat2 outFormat = new HFileOutputFormat2();
> >>> > > > >> try {
> >>> > > > >> this.writer = outFormat.getRecordWriter(this.context);
> >>> > > > >> } catch (InterruptedException iex) {
> >>> > > > >> throw new IOException("Opening the writer was interrupted.",
> >>> iex);
> >>> > > > >> }
> >>> > > > >> }
> >>> > > > >>
> >>> > > > >> But I'm not sure about how to pass the JobConf to the class,
> if
> >>> to
> >>> > > merge
> >>> > > > >> config fileas, where HFileOutputFormat2 writes the data and
> how
> >>> to
> >>> > > > >> implement the public void writeRecord(Record record) API.
> >>> > > > >> Could I do a little chat off the mailing list with the
> >>> implementor
> >>> > of
> >>> > > > >> this extension?
> >>> > > > >>
> >>> > > > >> On Mon, Nov 3, 2014 at 11:51 AM, Fabian Hueske <
> >>> [email protected]>
> >>> > > > >> wrote:
> >>> > > > >>
> >>> > > > >>> Hi Flavio
> >>> > > > >>>
> >>> > > > >>> let me try to answer your last question on the user's list
> (to
> >>> the
> >>> > > best
> >>> > > > >>> of
> >>> > > > >>> my HBase knowledge).
> >>> > > > >>> "I just wanted to known if and how regiom splitting is
> >>> handled. Can
> >>> > > you
> >>> > > > >>> explain me in detail how Flink and HBase works?what is not
> >>> fully
> >>> > > clear
> >>> > > > to
> >>> > > > >>> me is when computation is done by region servers and when
> data
> >>> > start
> >>> > > > flow
> >>> > > > >>> to a Flink worker (that in ky test job is only my pc) and how
> >>> ro
> >>> > > > >>> undertsand
> >>> > > > >>> better the important logged info to understand if my job is
> >>> > > performing
> >>> > > > >>> well"
> >>> > > > >>>
> >>> > > > >>> HBase partitions its tables into so called "regions" of keys
> >>> and
> >>> > > stores
> >>> > > > >>> the
> >>> > > > >>> regions distributed in the cluster using HDFS. I think an
> HBase
> >>> > > region
> >>> > > > >>> can
> >>> > > > >>> be thought of as a HDFS block. To make reading an HBase table
> >>> > > > efficient,
> >>> > > > >>> region reads should be locally done, i.e., an InputFormat
> >>> should
> >>> > > > >>> primarily
> >>> > > > >>> read region that are stored on the same machine as the IF is
> >>> > running
> >>> > > > on.
> >>> > > > >>> Flink's InputSplits partition the HBase input by regions and
> >>> add
> >>> > > > >>> information about the storage location of the region. During
> >>> > > execution,
> >>> > > > >>> input splits are assigned to InputFormats that can do local
> >>> reads.
> >>> > > > >>>
> >>> > > > >>> Best, Fabian
> >>> > > > >>>
> >>> > > > >>> 2014-11-03 11:13 GMT+01:00 Stephan Ewen <[email protected]>:
> >>> > > > >>>
> >>> > > > >>> > Hi!
> >>> > > > >>> >
> >>> > > > >>> > The way of passing parameters through the configuration is
> >>> very
> >>> > old
> >>> > > > >>> (the
> >>> > > > >>> > original HBase format dated back to that time). I would
> >>> simply
> >>> > make
> >>> > > > the
> >>> > > > >>> > HBase format take those parameters through the constructor.
> >>> > > > >>> >
> >>> > > > >>> > Greetings,
> >>> > > > >>> > Stephan
> >>> > > > >>> >
> >>> > > > >>> >
> >>> > > > >>> > On Mon, Nov 3, 2014 at 10:59 AM, Flavio Pompermaier <
> >>> > > > >>> [email protected]>
> >>> > > > >>> > wrote:
> >>> > > > >>> >
> >>> > > > >>> > > The problem is that I also removed the
> >>> GenericTableOutputFormat
> >>> > > > >>> because
> >>> > > > >>> > > there is an incompatibility between hadoop1 and hadoop2
> for
> >>> > class
> >>> > > > >>> > > TaskAttemptContext and TaskAttemptContextImpl..
> >>> > > > >>> > > then it would be nice if the user doesn't have to worry
> >>> about
> >>> > > > passing
> >>> > > > >>> > > pact.hbase.jtkey and pact.job.id parameters..
> >>> > > > >>> > > I think it is probably a good idea to remove hadoop1
> >>> > > compatibility
> >>> > > > >>> and
> >>> > > > >>> > keep
> >>> > > > >>> > > enable HBase addon only for hadoop2 (as before) and
> decide
> >>> how
> >>> > to
> >>> > > > >>> mange
> >>> > > > >>> > > those 2 parameters..
> >>> > > > >>> > >
> >>> > > > >>> > > On Mon, Nov 3, 2014 at 10:19 AM, Stephan Ewen <
> >>> > [email protected]>
> >>> > > > >>> wrote:
> >>> > > > >>> > >
> >>> > > > >>> > > > It is fine to remove it, in my opinion.
> >>> > > > >>> > > >
> >>> > > > >>> > > > On Mon, Nov 3, 2014 at 10:11 AM, Flavio Pompermaier <
> >>> > > > >>> > > [email protected]>
> >>> > > > >>> > > > wrote:
> >>> > > > >>> > > >
> >>> > > > >>> > > > > That is one class I removed because it was using the
> >>> > > deprecated
> >>> > > > >>> API
> >>> > > > >>> > > > > GenericDataSink..I can restore them but the it will
> be
> >>> a
> >>> > good
> >>> > > > >>> idea to
> >>> > > > >>> > > > > remove those warning (also because from what I
> >>> understood
> >>> > the
> >>> > > > >>> Record
> >>> > > > >>> > > APIs
> >>> > > > >>> > > > > are going to be removed).
> >>> > > > >>> > > > >
> >>> > > > >>> > > > > On Mon, Nov 3, 2014 at 9:51 AM, Fabian Hueske <
> >>> > > > >>> [email protected]>
> >>> > > > >>> > > > wrote:
> >>> > > > >>> > > > >
> >>> > > > >>> > > > > > I'm not familiar with the HBase connector code, but
> >>> are
> >>> > you
> >>> > > > >>> maybe
> >>> > > > >>> > > > looking
> >>> > > > >>> > > > > > for the GenericTableOutputFormat?
> >>> > > > >>> > > > > >
> >>> > > > >>> > > > > > 2014-11-03 9:44 GMT+01:00 Flavio Pompermaier <
> >>> > > > >>> [email protected]
> >>> > > > >>> > >:
> >>> > > > >>> > > > > >
> >>> > > > >>> > > > > > > | was trying to modify the example setting
> >>> > > > hbaseDs.output(new
> >>> > > > >>> > > > > > > HBaseOutputFormat()); but I can't see any
> >>> > > HBaseOutputFormat
> >>> > > > >>> > > > > class..maybe
> >>> > > > >>> > > > > > we
> >>> > > > >>> > > > > > > shall use another class?
> >>> > > > >>> > > > > > >
> >>> > > > >>> > > > > > > On Mon, Nov 3, 2014 at 9:39 AM, Flavio
> Pompermaier
> >>> <
> >>> > > > >>> > > > > [email protected]
> >>> > > > >>> > > > > > >
> >>> > > > >>> > > > > > > wrote:
> >>> > > > >>> > > > > > >
> >>> > > > >>> > > > > > > > Maybe that's something I could add to the HBase
> >>> > example
> >>> > > > and
> >>> > > > >>> > that
> >>> > > > >>> > > > > could
> >>> > > > >>> > > > > > be
> >>> > > > >>> > > > > > > > better documented in the Wiki.
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > > > Since we're talking about the wiki..I was
> >>> looking at
> >>> > > the
> >>> > > > >>> Java
> >>> > > > >>> > > API (
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > >
> >>> > > > >>> > > > > >
> >>> > > > >>> > > > >
> >>> > > > >>> > > >
> >>> > > > >>> > >
> >>> > > > >>> >
> >>> > > > >>>
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://flink.incubator.apache.org/docs/0.6-incubating/java_api_guide.html
> >>> > > > >>> )
> >>> > > > >>> > > > > > > > and the link to the KMeans example is not
> working
> >>> > > (where
> >>> > > > it
> >>> > > > >>> > says
> >>> > > > >>> > > > For
> >>> > > > >>> > > > > a
> >>> > > > >>> > > > > > > > complete example program, have a look at KMeans
> >>> > > > Algorithm).
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > > > Best,
> >>> > > > >>> > > > > > > > Flavio
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > > > On Mon, Nov 3, 2014 at 9:12 AM, Flavio
> >>> Pompermaier <
> >>> > > > >>> > > > > > [email protected]
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > > > wrote:
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > > >> Ah ok, perfect! That was the reason why I
> >>> removed it
> >>> > > :)
> >>> > > > >>> > > > > > > >>
> >>> > > > >>> > > > > > > >> On Mon, Nov 3, 2014 at 9:10 AM, Stephan Ewen <
> >>> > > > >>> > [email protected]>
> >>> > > > >>> > > > > > wrote:
> >>> > > > >>> > > > > > > >>
> >>> > > > >>> > > > > > > >>> You do not really need a HBase data sink. You
> >>> can
> >>> > > call
> >>> > > > >>> > > > > > > >>> "DataSet.output(new
> >>> > > > >>> > > > > > > >>> HBaseOutputFormat())"
> >>> > > > >>> > > > > > > >>>
> >>> > > > >>> > > > > > > >>> Stephan
> >>> > > > >>> > > > > > > >>> Am 02.11.2014 23:05 schrieb "Flavio
> >>> Pompermaier" <
> >>> > > > >>> > > > > > [email protected]
> >>> > > > >>> > > > > > > >:
> >>> > > > >>> > > > > > > >>>
> >>> > > > >>> > > > > > > >>> > Just one last thing..I removed the
> >>> HbaseDataSink
> >>> > > > >>> because I
> >>> > > > >>> > > > think
> >>> > > > >>> > > > > it
> >>> > > > >>> > > > > > > was
> >>> > > > >>> > > > > > > >>> > using the old APIs..can someone help me in
> >>> > updating
> >>> > > > >>> that
> >>> > > > >>> > > class?
> >>> > > > >>> > > > > > > >>> >
> >>> > > > >>> > > > > > > >>> > On Sun, Nov 2, 2014 at 10:55 AM, Flavio
> >>> > > Pompermaier <
> >>> > > > >>> > > > > > > >>> [email protected]>
> >>> > > > >>> > > > > > > >>> > wrote:
> >>> > > > >>> > > > > > > >>> >
> >>> > > > >>> > > > > > > >>> > > Indeed this time the build has been
> >>> successful
> >>> > :)
> >>> > > > >>> > > > > > > >>> > >
> >>> > > > >>> > > > > > > >>> > > On Sun, Nov 2, 2014 at 10:29 AM, Fabian
> >>> Hueske
> >>> > <
> >>> > > > >>> > > > > > [email protected]
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > > >>> > wrote:
> >>> > > > >>> > > > > > > >>> > >
> >>> > > > >>> > > > > > > >>> > >> You can also setup Travis to build your
> >>> own
> >>> > > Github
> >>> > > > >>> > > > > repositories
> >>> > > > >>> > > > > > by
> >>> > > > >>> > > > > > > >>> > linking
> >>> > > > >>> > > > > > > >>> > >> it to your Github account. That way
> >>> Travis can
> >>> > > > >>> build all
> >>> > > > >>> > > > your
> >>> > > > >>> > > > > > > >>> branches
> >>> > > > >>> > > > > > > >>> > >> (and
> >>> > > > >>> > > > > > > >>> > >> you can also trigger rebuilds if
> something
> >>> > > fails).
> >>> > > > >>> > > > > > > >>> > >> Not sure if we can manually trigger
> >>> retrigger
> >>> > > > >>> builds on
> >>> > > > >>> > > the
> >>> > > > >>> > > > > > Apache
> >>> > > > >>> > > > > > > >>> > >> repository.
> >>> > > > >>> > > > > > > >>> > >>
> >>> > > > >>> > > > > > > >>> > >> Support for Hadoop 1 and 2 is indeed a
> >>> very
> >>> > good
> >>> > > > >>> > addition
> >>> > > > >>> > > > :-)
> >>> > > > >>> > > > > > > >>> > >>
> >>> > > > >>> > > > > > > >>> > >> For the discusion about the PR itself, I
> >>> would
> >>> > > > need
> >>> > > > >>> a
> >>> > > > >>> > bit
> >>> > > > >>> > > > more
> >>> > > > >>> > > > > > > time
> >>> > > > >>> > > > > > > >>> to
> >>> > > > >>> > > > > > > >>> > >> become more familiar with HBase. I do
> >>> also not
> >>> > > > have
> >>> > > > >>> a
> >>> > > > >>> > > HBase
> >>> > > > >>> > > > > > setup
> >>> > > > >>> > > > > > > >>> > >> available
> >>> > > > >>> > > > > > > >>> > >> here.
> >>> > > > >>> > > > > > > >>> > >> Maybe somebody else of the community who
> >>> was
> >>> > > > >>> involved
> >>> > > > >>> > > with a
> >>> > > > >>> > > > > > > >>> previous
> >>> > > > >>> > > > > > > >>> > >> version of the HBase connector could
> >>> comment
> >>> > on
> >>> > > > your
> >>> > > > >>> > > > question.
> >>> > > > >>> > > > > > > >>> > >>
> >>> > > > >>> > > > > > > >>> > >> Best, Fabian
> >>> > > > >>> > > > > > > >>> > >>
> >>> > > > >>> > > > > > > >>> > >> 2014-11-02 9:57 GMT+01:00 Flavio
> >>> Pompermaier <
> >>> > > > >>> > > > > > > [email protected]
> >>> > > > >>> > > > > > > >>> >:
> >>> > > > >>> > > > > > > >>> > >>
> >>> > > > >>> > > > > > > >>> > >> > As suggestes by Fabian I moved the
> >>> > discussion
> >>> > > on
> >>> > > > >>> this
> >>> > > > >>> > > > > mailing
> >>> > > > >>> > > > > > > >>> list.
> >>> > > > >>> > > > > > > >>> > >> >
> >>> > > > >>> > > > > > > >>> > >> > I think that what is still to be
> >>> discussed
> >>> > is
> >>> > > > >>> how  to
> >>> > > > >>> > > > > > retrigger
> >>> > > > >>> > > > > > > >>> the
> >>> > > > >>> > > > > > > >>> > >> build
> >>> > > > >>> > > > > > > >>> > >> > on Travis (I don't have an account)
> and
> >>> if
> >>> > the
> >>> > > > PR
> >>> > > > >>> can
> >>> > > > >>> > be
> >>> > > > >>> > > > > > > >>> integrated.
> >>> > > > >>> > > > > > > >>> > >> >
> >>> > > > >>> > > > > > > >>> > >> > Maybe what I can do is to move the
> HBase
> >>> > > example
> >>> > > > >>> in
> >>> > > > >>> > the
> >>> > > > >>> > > > test
> >>> > > > >>> > > > > > > >>> package
> >>> > > > >>> > > > > > > >>> > >> (right
> >>> > > > >>> > > > > > > >>> > >> > now I left it in the main folder) so
> it
> >>> will
> >>> > > > force
> >>> > > > >>> > > Travis
> >>> > > > >>> > > > to
> >>> > > > >>> > > > > > > >>> rebuild.
> >>> > > > >>> > > > > > > >>> > >> > I'll do it within a couple of hours.
> >>> > > > >>> > > > > > > >>> > >> >
> >>> > > > >>> > > > > > > >>> > >> > Another thing I forgot to say is that
> >>> the
> >>> > > hbase
> >>> > > > >>> > > extension
> >>> > > > >>> > > > is
> >>> > > > >>> > > > > > now
> >>> > > > >>> > > > > > > >>> > >> compatible
> >>> > > > >>> > > > > > > >>> > >> > with both hadoop 1 and 2.
> >>> > > > >>> > > > > > > >>> > >> >
> >>> > > > >>> > > > > > > >>> > >> > Best,
> >>> > > > >>> > > > > > > >>> > >> > Flavio
> >>> > > > >>> > > > > > > >>> > >>
> >>> > > > >>> > > > > > > >>> > >
> >>> > > > >>> > > > > > > >>> >
> >>> > > > >>> > > > > > > >>>
> >>> > > > >>> > > > > > > >>
> >>> > > > >>> > > > > > > >
> >>> > > > >>> > > > > > >
> >>> > > > >>> > > > > >
> >>> > > > >>> > > > >
> >>> > > > >>> > > >
> >>> > > > >>> > >
> >>> > > > >>> >
> >>> > > > >>>
> >>> > > > >>
> >>> > > > >>
> >>> > > > >>
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >
> >

Reply via email to