[ 
https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994598#comment-14994598
 ] 

Haohui Mai commented on HDFS-9117:
----------------------------------

bq. As an example, let's say we are writing a native replacement for the dfs 
tool using the native libhdfs++ codebase (not the libhdfs compatability layer) 
that cat do "-ls" and "-copyFromLocal", etc. To provide Least Astonishment for 
our consumers, they would expect that a properly configured Hadoop node [with 
the HADOOP_HOME pointing to /etc/hadoop-2.9.9 and its config files] could run 
"hdfspp -ls /tmp" and have it automatically find the NN and configure the 
communications parameters correctly to talk to their cluster.

Unfortunately the assumption is broken in many ways -- it is fully 
implementation defined.  For example, there are issues whether {{HADOOP_HOME}} 
or {{HADOOP_PREFIX}} should be chosen. Configuration files are only required to 
be specified in {{CLASSPATH}} but not necessary in the {{HADOOP_HOME}} 
directory. Different vendors might have changed their scripts and put the 
configuration in different places. Scripts evolves across versions. We have 
very different scripts between trunk and branch-2.

While it definitely useful in the libhdfs compatibility layer, I'm doubtful it 
should be added into the core part of the library due to all these complexity.

Therefore I believe that the focus of the library should be providing 
mechanisms to interact with HDFS but not concrete policy (e.g., location of the 
configuration) on how to interact. We don't have any libraries to implement the 
protocols and mechanisms to interact with HDFS yet (which is the reusable 
part). The policy is highly customized in different environments but it can be 
worked around easily (which is the less reusable part).

bq. given this context, do you agree that we need to support libhdfs++ 
compatibility with the hdfs-site.xml files that are already deployed at 
customer 

There are two levels of APIs when you talk about libhdfs++ APIs. The core API 
focuses on providing mechanisms to interact with HDFS, such as implementing the 
Hadoop RPC, DataTransferProtocol. The API that you're referring to might be a 
convenient API for libhdfs++. The functionality is definitely helpful, but it 
can be provided as a utility helper instead of baking it into the main contract 
of libhdfs++.

My suggestion is the following:

1. Focusing on getting the code on parsing XML in strings (which is the core 
functionality of parsing configuration) in this jira. It should not contain any 
file operations.
2. Separating the tasks on searching through paths, reading files, etc. into 
different jiras. For now it makes sense to put it along with the {{libhdfs}} 
compatibility layer. Since it's an implementation detail I believe we can 
quickly go through it. At a later point of time we can promote the code to a 
common library once we have a proposal on how the libhdfs++ convenient APIs 
look like.


> Config file reader / options classes for libhdfs++
> --------------------------------------------------
>
>                 Key: HDFS-9117
>                 URL: https://issues.apache.org/jira/browse/HDFS-9117
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: HDFS-8707
>            Reporter: Bob Hansen
>            Assignee: Bob Hansen
>         Attachments: HDFS-9117.HDFS-8707.001.patch, 
> HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, 
> HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, 
> HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, 
> HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, 
> HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, 
> HDFS-9117.HDFS-9288.007.patch
>
>
> For environmental compatability with HDFS installations, libhdfs++ should be 
> able to read the configurations from Hadoop XML files and behave in line with 
> the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML 
> configuration files.
> Similarly, an internal Options architecture for libhdfs++ should be developed 
> to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to