Josh Elser created ACCUMULO-3704:
------------------------------------
Summary: Localize client configuration for MapReduce
Key: ACCUMULO-3704
URL: https://issues.apache.org/jira/browse/ACCUMULO-3704
Project: Accumulo
Issue Type: Improvement
Components: client, mapreduce
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Blocker
Fix For: 1.7.0
Backstory is that I had a Kerberized Hadoop node and was running
ContinuousVerify on it.
The job launched successfully, but the mappers hung, unable to authenticate
with the TabletServers. I knew that I had the configuration (mostly) right,
because the Tool (client code) was able to fetch the split points for the job:
the mappers were just unable to read from Accumulo.
The Tool was able to talk to Accumulo because ACCUMULO_CONF_DIR was correctly
set by config.sh (called from tool.sh). However, environment variables from the
Tool are not passed into the child mappers/reducers. As such, the Mappers could
only guess at a few locations where the client configuration file might be. In
my case, they did not guess correctly. This kind of boils down to the following:
1. Client launches job with correct environment
2. Mappers reliably fail to talk to Accumulo
[~billie.rinaldi] had the suggestion that we localize the client configuration
in the Job itself. I think the easiest way to do this is to construct a
ClientConfiguration in the Tool, serialize it as a property file and add it to
the distributed cache.
Then, when we construct the RecordReader, we can search for that file first,
and then fall back to loading the default. This should make a seamless
experience for users and prevents the need for Accumulo configuration across
all YARN nodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)