Re: Using HDFS as a secondary FS

Denis Magda Mon, 14 Dec 2015 00:12:38 -0800

Hi Ivan,

1) Yes, I think that it makes sense to have the old versions of the docswhile an old version is still considered to be used by someone.

2) Absolutely, the time to add a corresponding article on the readme.iohas come. It's not the first time I see the question related to HDFS asa secondary FS.Before and now it's not clear for me what exact steps I should follow toenable such a configuration. Our current suggestions look like a puzzle.I'll assemble the puzzle on my side and prepare the article. Ivan if youdon't mind I would reaching you out directly asking for any technicalassistance if needed.


Regards,
Denis

On 12/14/2015 10:25 AM, Ivan V. wrote:

Hi, Valentin,

1) first of all note that the author of the question uses not the latest
doc page, namely
http://apacheignite.gridgain.org/v1.0/docs/igfs-secondary-file-system .
This is version 1.0, while the latest is 1.5:
https://apacheignite.readme.io/docs/hadoop-accelerator. Besides, it
appeared that some links from the latest doc version point to 1.0 doc
version. I fixed that in several places where I found that. Do we really
need old doc versions (1.0 -1.4)?

2) our documentation (
http://apacheignite.gridgain.org/docs/secondary-file-system) does not
provide any special setup instructions to configure HDFS as secondary file
system in Ignite. Our docs assume that if a user wants to integrate with
Hadoop, (s)he follows generic Hadoop integration instruction (e.g.
http://apacheignite.gridgain.org/docs/installing-on-apache-hadoop). It
looks like the page
http://apacheignite.gridgain.org/docs/secondary-file-system should be more
clear regarding the required configuration steps (in fact, setting up
HADOOP_HOME variable for Ignite node process).

3) Hadoop jars are correctly found by Ignite if the following conditions
are met:
(a) The "Hadoop Edition" distribution is used (not a "Fabric" edition).
(b) Either HADOOP_HOME environment variable is set up (for Apache Hadoop
distribution), or file "/etc/default/hadoop" exists and matches the Hadoop
distribution used (BigTop, Cloudera, HDP, etc.)

The exact mechanism of the Hadoop classpath composition can be found in
files
IGNITE_HOME/bin/include/hadoop-classpath.sh
IGNITE_HOME/bin/include/setenv.sh .

The issue is discussed in https://issues.apache.org/jira/browse/IGNITE-372
, https://issues.apache.org/jira/browse/IGNITE-483 .

On Sat, Dec 12, 2015 at 3:45 AM, Valentin Kulichenko <
[email protected]> wrote:

Igniters,

I'm looking at the question on SO [1] and I'm a bit confused.

We ship ignite-hadoop module only in Hadoop Accelerator and without Hadoop
JARs, assuming that user will include them from the Hadoop distribution he
uses. It seems OK for me when accelerator is plugged in to Hadoop to run
mapreduce jobs, but I can't figure out steps required to configure HDFS as
a secondary FS for IGFS. Which Hadoop JARs should be on classpath? Is user
supposed to add them manually?

Can someone with more expertise in our Hadoop integration clarify this? I
believe there is not enough documentation on this topic.

BTW, any ideas why user gets exception for JobConf class which is in
'mapred' package? Why map-reduce class is being used?

[1]

http://stackoverflow.com/questions/34221355/apache-ignite-what-are-the-dependencies-of-ignitehadoopigfssecondaryfilesystem

-Val

Re: Using HDFS as a secondary FS

Reply via email to