paul-rogers commented on pull request #2139:
URL: https://github.com/apache/drill/pull/2139#issuecomment-759855911


   @vdiravka, thank you for taking a look at this issue. It has been a while 
since I wrote this stuff. I've poked around a bit to refresh my memory of what 
we were trying to do.
   
   The goal is to have a config system that draws from both the HDFS config and 
from the DoY config files. When this was written, Drill did not use the HDFS 
config files; the config info was stored by Drill in ZK. Also, during testing, 
we might target any number of HDFS systems, so we had to support more than just 
the local HDFS config file.
   
   Those considerations led us to use a combination of the Drill-style config 
file and the HDFS config. Note that the DoY config is *not* the same as the 
Drill config (though both use the same HOCON library.) There is a 
chicken-and-egg problem: for the DoY client, Drill is a zip file; there is no 
actual Drill installed on the machine acting as the DoY client. Also, a single 
DoY client can run any number of Drill instances. Hence, DoY does not read a 
Drill config file; the DoY config is a separate entity.
   
   With that background, let's look at the [config 
file](https://github.com/apache/drill/blob/master/drill-yarn/src/main/resources/org/apache/drill/yarn/core/drill-on-yarn-defaults.conf):
   
   ```
   drill.yarn: {
     app-name: "Drill-on-YARN"
   
     # Settings here support a default single-node cluster on the local host,
     # using the default HDFS connection obtained from the Hadoop config files.
   
     dfs: {
       connection: ""
       app-dir: "/user/drill"
     }
     ...
   ```
   
   Note that `drill.yarn.dfs.connection` is supposed to the the target DFS 
connection. So, if you set that to `hdfs://localhost:9000`, things should work.
   
   You noted that we obtain the information twice. The code in 
[DfsFacade.connect()](https://github.com/apache/drill/blob/master/drill-yarn/src/main/java/org/apache/drill/yarn/core/DfsFacade.java)
 appears correct:
   
   ```
     public void connect() throws DfsFacadeException {
       loadYarnConfig();
       String dfsConnection = 
config.getString(DrillOnYarnConfig.DFS_CONNECTION);
       try {
         if (DoYUtil.isBlank(dfsConnection)) {
           fs = FileSystem.get(yarnConf);
         } else {
           URI uri;
           try {
             uri = new URI(dfsConnection);
           } catch (URISyntaxException e) {
             throw new DfsFacadeException(
                 "Illformed DFS connection: " + dfsConnection, e);
           }
           fs = FileSystem.get(uri, yarnConf);
         }
       } catch (IOException e) {
         throw new DfsFacadeException("Failed to create the DFS", e);
       }
     }
   ```
   
   We first use the value from the DoY config. If not set, we fall back to the 
Hadoop config. This behavior gives us what we want: DoY config first, else fall 
back to the Hadoop config.
   
   The initial note also mentions `loadYarnConfig()`. This method loads, well, 
the YARN config. The original thought, IIRC, is that the YARN config points to 
the YARN services; not the HDFS services. Also, the idea was that DoY works 
with a single YARN instance, identified by its YARN config. (Now, in reality, I 
suppose that there must be only one HDFS per YARN.) So, it should not be the 
case that the YARN config breaks HDFS.
   
   Is it the case, in this instance, that the HDFS server is configured in 
YARN, but not in the HDFS config files?
   
   At this point, I suspect I've exhausted my YARN knowledge. I can, however, 
offer a suggestion.
   
   If the YARN config holds (or reads) the HDFS config more accurately than the 
default HDFS config, then change the `connect()` method above to use the 
(cached) YARN config.
   
   Before doing this, I'd recommend researching these issues a bit more in the 
YARN and HDFS configs. Maybe that fs URI check in `loadYarnConfig()` is wrong.
   
   It may also be that, to use YARN, you must have a valid YARN config and a 
valid HDFS config to go with it, so we might not need the DoY connect config. 
(I suspect this config was added, in part, because MapRFS didn't use the HDFS 
configs, but I could be wrong.)
   
   So, perhaps do a bit more homework, then we can refine the fix.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to