On Mon, Feb 11, 2019 at 2:18 PM Schile,Nathan
<nathan.sch...@cerner.com.invalid> wrote:

> You would compare the webhdfs addresses from
> DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme()) to the hdfs
> addresses from FSHDFSUtils.getNNAddresses(desFs, conf) and see if there is
> an intersection. Something like the below. My assumption being that the
> same host runs both hdfs and webhdfs. Is my understanding correct?
>
Mostly yes. WebHDFS is served by NameNodes. (HttpFS is another story though)


>
> public static boolean isCoercibleToHdfs(Configuration conf, FileSystem
> srcFs, FileSystem desFs) {
>   if (isSameHdfs(conf, srcFs, desFs)) {
>     return true;
>   }
>
>   if (srcFs instanceof WebHdfsFileSystem && desFs instanceof
> DistributedFileSystem) {
>     String srcServiceName = srcFs.getCanonicalServiceName();
>     String desServiceName = desFs.getCanonicalServiceName();
>
>     if (srcServiceName == null || desServiceName == null) {
>       return false;
>     }
>
>     // Only compare hostnames since the ports used by webhdfs and hdfs are
> different.
>     Set<String> webhdfsHostnames = new HashSet<>();
>     if (srcServiceName.startsWith("ha-webhdfs") ||
> srcServiceName.startsWith("ha-swebhdfs")) {
>       Map<String, Map<String, InetSocketAddress>> haNnWebHdfsAddresses =
>           DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme());
>       String nameService =
> srcServiceName.substring(srcServiceName.indexOf(":") + 1);
>       if (haNnWebHdfsAddresses.containsKey(nameService)) {
>         Map<String, InetSocketAddress> nnMap =
> haNnWebHdfsAddresses.get(nameService);
>         for (Map.Entry<String, InetSocketAddress> addressEntry :
> nnMap.entrySet()) {
>           InetSocketAddress addr = addressEntry.getValue();
>           webhdfsHostnames.add(addr.getHostString());
>         }
>       }
>     } else {
>       webhdfsHostnames.add(srcServiceName.split(":")[0]);
>     }
>
>     Set<String> hdfsHostnames = new HashSet<>();
>     Set<InetSocketAddress> srcAddrs =
> getNNAddresses((DistributedFileSystem) desFs, conf);
>     for (InetSocketAddress address : srcAddrs) {
>       hdfsHostnames.add(address.getHostString());
>     }
>
>     return Sets.intersection(webhdfsHostnames, hdfsHostnames).size() > 0;
>   }
>   return false;
> }
>
>
>
> On 2019/02/11 06:12:43, 张铎(Duo Zhang) <p...@gmail.com<mailto:
> p...@gmail.com>> wrote:
> > How do we know if a webhdfs is the same with a hdfs?>
> >
> > Schile,Nathan <na...@cerner.com.invalid<mailto:na...@cerner.com.invalid>>
> 于2019年2月11日周一 下午1:25写道:>
> >
> > > Currently when bulk loading from a webhdfs filesystem, files are
> copied>
> > > rather than renamed if they reside on the same cluster [1]. This
> causes the>
> > > bulk load to not perform optimally.>
> > >>
> > >>
> > >>
> > > It seems like the configured webhdfs namenodes can be compared against>
> > > that of the namenodes being bulk loaded to, and if they are the same,
> then>
> > > the bulk loaded files could be renamed rather than copied.>
> > >>
> > >>
> > >>
> > > I was able to locate a JIRA comment bring up this use case [2] but
> wasn't>
> > > able to find a comment or JIRA for with a resolution.>
> > >>
> > >>
> > >>
> > > If this issue and proposed solution are acceptable, I would be happy
> to>
> > > log a JIRA and work on a patch. Please let me know how to proceed.>
> > >>
> > >>
> > >>
> > > [1]>
> > >
> https://github.com/apache/hbase/blob/rel/2.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java#L369-L383
> >
> > >>
> > > [2]>
> > >
> https://issues.apache.org/jira/browse/HBASE-8304?focusedCommentId=13923197&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13923197
> >
> > >>
> > >>
> > >>
> > > CONFIDENTIALITY NOTICE This message and any included attachments are
> from>
> > > Cerner Corporation and are intended only for the addressee. The
> information>
> > > contained in this message is confidential and may constitute inside or>
> > > non-public information under international, federal, or state
> securities>
> > > laws. Unauthorized forwarding, printing, copying, distribution, or use
> of>
> > > such information is strictly prohibited and may be unlawful. If you
> are not>
> > > the addressee, please promptly delete this message and notify the
> sender of>
> > > the delivery error by e-mail or you may call Cerner's corporate
> offices in>
> > > Kansas City, Missouri, U.S.A at (+1) (816)221-1024.>
> > >>
> >
>
>
> CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>

Reply via email to