On Mon, Feb 11, 2019 at 2:18 PM Schile,Nathan <nathan.sch...@cerner.com.invalid> wrote:
> You would compare the webhdfs addresses from > DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme()) to the hdfs > addresses from FSHDFSUtils.getNNAddresses(desFs, conf) and see if there is > an intersection. Something like the below. My assumption being that the > same host runs both hdfs and webhdfs. Is my understanding correct? > Mostly yes. WebHDFS is served by NameNodes. (HttpFS is another story though) > > public static boolean isCoercibleToHdfs(Configuration conf, FileSystem > srcFs, FileSystem desFs) { > if (isSameHdfs(conf, srcFs, desFs)) { > return true; > } > > if (srcFs instanceof WebHdfsFileSystem && desFs instanceof > DistributedFileSystem) { > String srcServiceName = srcFs.getCanonicalServiceName(); > String desServiceName = desFs.getCanonicalServiceName(); > > if (srcServiceName == null || desServiceName == null) { > return false; > } > > // Only compare hostnames since the ports used by webhdfs and hdfs are > different. > Set<String> webhdfsHostnames = new HashSet<>(); > if (srcServiceName.startsWith("ha-webhdfs") || > srcServiceName.startsWith("ha-swebhdfs")) { > Map<String, Map<String, InetSocketAddress>> haNnWebHdfsAddresses = > DFSUtil.getHaNnWebHdfsAddresses(conf, srcFs.getScheme()); > String nameService = > srcServiceName.substring(srcServiceName.indexOf(":") + 1); > if (haNnWebHdfsAddresses.containsKey(nameService)) { > Map<String, InetSocketAddress> nnMap = > haNnWebHdfsAddresses.get(nameService); > for (Map.Entry<String, InetSocketAddress> addressEntry : > nnMap.entrySet()) { > InetSocketAddress addr = addressEntry.getValue(); > webhdfsHostnames.add(addr.getHostString()); > } > } > } else { > webhdfsHostnames.add(srcServiceName.split(":")[0]); > } > > Set<String> hdfsHostnames = new HashSet<>(); > Set<InetSocketAddress> srcAddrs = > getNNAddresses((DistributedFileSystem) desFs, conf); > for (InetSocketAddress address : srcAddrs) { > hdfsHostnames.add(address.getHostString()); > } > > return Sets.intersection(webhdfsHostnames, hdfsHostnames).size() > 0; > } > return false; > } > > > > On 2019/02/11 06:12:43, 张铎(Duo Zhang) <p...@gmail.com<mailto: > p...@gmail.com>> wrote: > > How do we know if a webhdfs is the same with a hdfs?> > > > > Schile,Nathan <na...@cerner.com.invalid<mailto:na...@cerner.com.invalid>> > 于2019年2月11日周一 下午1:25写道:> > > > > > Currently when bulk loading from a webhdfs filesystem, files are > copied> > > > rather than renamed if they reside on the same cluster [1]. This > causes the> > > > bulk load to not perform optimally.> > > >> > > >> > > >> > > > It seems like the configured webhdfs namenodes can be compared against> > > > that of the namenodes being bulk loaded to, and if they are the same, > then> > > > the bulk loaded files could be renamed rather than copied.> > > >> > > >> > > >> > > > I was able to locate a JIRA comment bring up this use case [2] but > wasn't> > > > able to find a comment or JIRA for with a resolution.> > > >> > > >> > > >> > > > If this issue and proposed solution are acceptable, I would be happy > to> > > > log a JIRA and work on a patch. Please let me know how to proceed.> > > >> > > >> > > >> > > > [1]> > > > > https://github.com/apache/hbase/blob/rel/2.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SecureBulkLoadManager.java#L369-L383 > > > > >> > > > [2]> > > > > https://issues.apache.org/jira/browse/HBASE-8304?focusedCommentId=13923197&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13923197 > > > > >> > > >> > > >> > > > CONFIDENTIALITY NOTICE This message and any included attachments are > from> > > > Cerner Corporation and are intended only for the addressee. The > information> > > > contained in this message is confidential and may constitute inside or> > > > non-public information under international, federal, or state > securities> > > > laws. Unauthorized forwarding, printing, copying, distribution, or use > of> > > > such information is strictly prohibited and may be unlawful. If you > are not> > > > the addressee, please promptly delete this message and notify the > sender of> > > > the delivery error by e-mail or you may call Cerner's corporate > offices in> > > > Kansas City, Missouri, U.S.A at (+1) (816)221-1024.> > > >> > > > > > CONFIDENTIALITY NOTICE This message and any included attachments are from > Cerner Corporation and are intended only for the addressee. The information > contained in this message is confidential and may constitute inside or > non-public information under international, federal, or state securities > laws. Unauthorized forwarding, printing, copying, distribution, or use of > such information is strictly prohibited and may be unlawful. If you are not > the addressee, please promptly delete this message and notify the sender of > the delivery error by e-mail or you may call Cerner's corporate offices in > Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >