Re: Using HDFS as a secondary FS

Dmitriy Setrakyan Mon, 14 Dec 2015 07:20:18 -0800

Ivan, I think this should be documented, no?

On Mon, Dec 14, 2015 at 2:25 AM, Ivan V. <iveselovs...@gridgain.com> wrote:


> To enable just an IGFS persistence there is no need to use HDFS (this
> requires Hadoop dependency, requires configured HDFS cluster, etc.).
> We have requests https://issues.apache.org/jira/browse/IGNITE-1120 ,
> https://issues.apache.org/jira/browse/IGNITE-1926 to implement the
> persistence upon local file system, and we already close to  the solution.
>
> Regarding the secondary Fs doc page (
> http://apacheignite.gridgain.org/docs/secondary-file-system) I would
> suggest to add the following text there:
> ------------------------
> If Ignite node with secondary file system configured on a machine with
> Hadoop distribution, make sure Ignite is able to find appropriate Hadoop
> libraries: set HADOOP_HOME environment variable for the Ignite process if
> you're using Apache Hadoop distribution, or, if you use another
> distribution (HDP, Cloudera, BigTop, etc.) make sure /etc/default/hadoop
> file exists and has appropriate contents.
>
> If Ignite node with secondary file system configured on a machine without
> Hadoop distribution, you can manually add necessary Hadoop dependencies to
> Ignite node classpath: these are dependencies of groupId
> "org.apache.hadoop" listed in file modules/hadoop/pom.xml . Currently they
> are:
>
>    1. hadoop-annotations
>    2. hadoop-auth
>    3. hadoop-common
>    4. hadoop-hdfs
>    5. hadoop-mapreduce-client-common
>    6. hadoop-mapreduce-client-core
>
> ------------------------
>
> On Mon, Dec 14, 2015 at 11:21 AM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> > Guys,
> >
> > Why don't we include ignite-hadoop module in Fabric? This user simply
> wants
> > to configure HDFS as a secondary file system to ensure persistence. Not
> > having the opportunity to do this in Fabric looks weird to me. And
> actually
> > I don't think this is a use case for Hadoop Accelerator.
> >
> > -Val
> >
> > On Mon, Dec 14, 2015 at 12:11 AM, Denis Magda <dma...@gridgain.com>
> wrote:
> >
> > > Hi Ivan,
> > >
> > > 1) Yes, I think that it makes sense to have the old versions of the
> docs
> > > while an old version is still considered to be used by someone.
> > >
> > > 2) Absolutely, the time to add a corresponding article on the
> readme.io
> > > has come. It's not the first time I see the question related to HDFS
> as a
> > > secondary FS.
> > > Before and now it's not clear for me what exact steps I should follow
> to
> > > enable such a configuration. Our current suggestions look like a
> puzzle.
> > > I'll assemble the puzzle on my side and prepare the article. Ivan if
> you
> > > don't mind I would reaching you out directly asking for any technical
> > > assistance if needed.
> > >
> > > Regards,
> > > Denis
> > >
> > >
> > > On 12/14/2015 10:25 AM, Ivan V. wrote:
> > >
> > >> Hi, Valentin,
> > >>
> > >> 1) first of all note that the author of the question uses not the
> latest
> > >> doc page, namely
> > >> http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system
> .
> > >> This is version 1.0, while the latest is 1.5:
> > >> https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
> > >> appeared that some links from the latest doc version point to 1.0 doc
> > >> version. I fixed that in several places where I found that. Do we
> really
> > >> need old doc versions (1.0 -1.4)?
> > >>
> > >> 2) our documentation (
> > >> http://apacheignite.gridgain.org/docs/secondary-file-system) does not
> > >> provide any special setup instructions to configure HDFS as secondary
> > file
> > >> system in Ignite. Our docs assume that if a user wants to integrate
> with
> > >> Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
> > >> http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop).
> It
> > >> looks like the page
> > >> http://apacheignite.gridgain.org/docs/secondary-file-system should be
> > >> more
> > >> clear regarding the required configuration steps (in fact, setting up
> > >> HADOOP_HOME variable for Ignite node process).
> > >>
> > >> 3) Hadoop jars are correctly found by Ignite if the following
> conditions
> > >> are met:
> > >> (a) The "Hadoop Edition" distribution is used (not a "Fabric"
> edition).
> > >> (b) Either HADOOP_HOME environment variable is set up (for Apache
> Hadoop
> > >> distribution), or file "/etc/default/hadoop" exists and matches the
> > Hadoop
> > >> distribution used (BigTop, Cloudera, HDP, etc.)
> > >>
> > >> The exact mechanism of the Hadoop classpath composition can be found
> in
> > >> files
> > >> IGNITE_HOME/bin/include/hadoop-classpath.sh
> > >> IGNITE_HOME/bin/include/setenv.sh .
> > >>
> > >> The issue is discussed in
> > >> https://issues.apache.org/jira/browse/IGNITE-372
> > >> , https://issues.apache.org/jira/browse/IGNITE-483 .
> > >>
> > >> On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
> > >> valentin.kuliche...@gmail.com> wrote:
> > >>
> > >> Igniters,
> > >>>
> > >>> I'm looking at the question on SO [1] and I'm a bit confused.
> > >>>
> > >>> We ship ignite-hadoop module only in Hadoop Accelerator and without
> > >>> Hadoop
> > >>> JARs, assuming that user will include them from the Hadoop
> distribution
> > >>> he
> > >>> uses. It seems OK for me when accelerator is plugged in to Hadoop to
> > run
> > >>> mapreduce jobs, but I can't figure out steps required to configure
> HDFS
> > >>> as
> > >>> a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is
> > >>> user
> > >>> supposed to add them manually?
> > >>>
> > >>> Can someone with more expertise in our Hadoop integration clarify
> > this? I
> > >>> believe there is not enough documentation on this topic.
> > >>>
> > >>> BTW, any ideas why user gets exception for JobConf class which is in
> > >>> 'mapred' package? Why map-reduce class is being used?
> > >>>
> > >>> [1]
> > >>>
> > >>>
> > >>>
> >
> http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem
> > >>>
> > >>> -Val
> > >>>
> > >>>
> > >
> >
>

Re: Using HDFS as a secondary FS

Reply via email to