i recommend method 2, also for security reason. 2012/5/12 financeturd financeturd <financet...@yahoo.com>
> Hello, > > We have a large number of > custom-generated files (not just web logs) that we need to move from our > JBoss servers to HDFS. Our first implementation ran a cron job every 5 > minutes to move our files from the "output" directory to HDFS. > > Is this recommended? We are being told by our IT team that our JBoss > servers should not have access to HDFS for security reasons. The files > must be "sucked" to HDFS by other servers that do not accept traffic > from the outside. In essence, they are asking for a layer of > indirection. Instead of: > {JBoss server} --> {HDFS} > it's being requested that it look like: > {Separate server} <-- {JBoss server} > and then > {Separate server} --> HDFS > > > While I understand in principle what is being said, the security of having > processes on JBoss servers writing files to HDFS doesn't seem any worse > than having Tomcat servers access a central database, which they do. > > Can anyone comment on what a recommended approach would be? Should our > JBoss servers push their data to HDFS or should the data be pulled by > another server and then placed into HDFS? > > Thank you! > FT -- Regards Junyong