Great points, Olaf! :) On Tue, Jul 14, 2015 at 12:37 AM, Olaf Flebbe <[email protected]> wrote:
> Hi everyone, > > nice to know someyone with scientific computing background around, too. > > Let me phrase a bit different what "native" to LInux mean to me: The > filesystem's API is the open(2), close(2) systemcalls, rather a Java API. > FUSE mounted filesystems are generally slooooow because of the additional > switch from kernel to userland needed for each system call. And let me > stress that POSIX is only a small subset of what a filesystem has to do .. > the semantics of acls, mmap(), symlinks, truncate(), locking and sparse > files are way beyond POSIX or any standardization. It is funny that I now > have seen codes, where developer has problems porting to UNIX systems, > since it uses special Linux semantics. Of course a cluster filesystem has > to break usual semantics in some way in order to achive its performance. > > HPC with Linux Clusters is one cornerstone of the business of s+c and we > use cluster filesystems a lot. One special point to add: There are cluster > filesystems which can exploit low latency networks like infiniband with > RDMA rather TCP/IP and zero-copy which is beneficial performancewise. > Unfortunatly there is no RDMA support in hadoop as far as I know. > > Someone gave me a hint about lustre integration of hadoop lately. I still > have to look into it.... > > To use Spark or whatever directly on Ceph rather HDFS for instance could > be beneficial for scenarios like openstack where there is no notion of > local disk -- aside from using "ironic" . > > Olaf > > > > > > Am 14.07.2015 um 06:23 schrieb RJ Nowling <[email protected]>: > > > > Thanks, Cos! > > > >> from Ignite standpoint replacing one with another doesn't give much > > advantage > > > > Agreed. From the standpoint of Ignite, Hadoop, or Spark, Gluster works > no > > differently than HDFS. If Ignite doesn't have an object store available > > already, then Ceph could add that capability. > > > > From the standpoint of the user and integration with a larger IT > > infrastructure, Gluster offers advantages over HDFS. As you say, Gluster > > is a POSIX-compatible native filesystem -- it provides a FUSE module for > > mounting remote Gluster volumes. This means non-Hadoop applications can > > store data in the same file system as Hadoop. > > > > I come from a scientific computing background where pretty much every > > simulation or analysis tool expected access to a POSIX file system. We > > evaluated Hadoop at one point but chose not to use it because we would > have > > to copy all of our data into HDFS. Gluster is a much better POSIX > > distributed file system than what my university's cluster used, and I > wish > > I had known about it while doing my Ph.D. :) > > > > For my work at Red Hat, we run Spark on Gluster. We don't use any > special > > plugins -- since Spark uses the Hadoop file system libraries, Spark can > > read off native file systems. Same advantages mentioned above -- nice to > > be able to use grep, cat, etc. alongside Spark :) > > > > > > On Mon, Jul 13, 2015 at 8:55 PM, Konstantin Boudnik <[email protected]> > wrote: > > > >> On Mon, Jul 13, 2015 at 07:00PM, RJ Nowling wrote: > >>> Cos, > >>> > >>> Can you expand on what you mean by "native to Linux" for Ceph? > >> > >> I meant that the file system is presented in a Linux distro as kernel > >> module. > >> HDFS, as you know, is an alien Java process that creates a layer > >> indirection > >> on top of say ext4 or jfs to provide a distributed storage; Ceph does > this > >> similarly to other _native_ file systems. > >> > >>> And can you elaborate on why Gluster doesn't make sense as a HDFS > >>> replacement to you? > >> > >> What I wanted to express, perhaps a bit clumsy, is that HDFS and Gluster > >> are > >> two instances of HCFS. from Ignite standpoint replacing one with another > >> doesn't give much advantage (unless I am missing something about the > >> Gluster). > >> Hopefully it makes sense? > >> > >>> Not trying to argue -- just generally curious. :) > >> > >> Not trying to cast a shadow on Gluster nor whitewash HDFS (far from it) > ;) > >> > >> Cos > >> > >>> Thanks! > >>> > >>> On Mon, Jul 13, 2015 at 5:06 PM, Konstantin Boudnik <[email protected]> > >> wrote: > >>> > >>>> I think file system is more universally used. However, one can build > >> an FS > >>>> on > >>>> top of a good object storage - just need to provide some metadata > >>>> abstraction/concept. > >>>> > >>>> Replacing HDFS w/ Gluster doesn't make much sense to me (if ever be > >>>> considered). What I like about Ceph is that it is native to Linux, > >> unlike > >>>> all > >>>> other artificial HCFS contraptions. Hence my initial question. > >>>> > >>>> Cos > >>>> > >>>> On Thu, Jul 09, 2015 at 01:53AM, Dmitriy Setrakyan wrote: > >>>>> Hm... I would think that file system would be more beneficial, > >> although > >>>>> object store on disk can also be valuable. > >>>>> > >>>>> Cos, what is your thinking? > >>>>> > >>>>> D. > >>>>> > >>>>> On Wed, Jul 8, 2015 at 8:40 PM, RJ Nowling <[email protected]> > >> wrote: > >>>>> > >>>>>> Ceph makes a better object store while Gluster makes a better file > >>>> system. > >>>>>> That's why Ceph is a popular backend for OpenStack Swift. > >>>>>> > >>>>>> Does Ignite want a FS or Object backend? > >>>>>> > >>>>>> On Wed, Jul 8, 2015 at 5:57 PM, Konstantin Boudnik <[email protected] > >>> > >>>> wrote: > >>>>>> > >>>>>>> Good point... although I was curious about Ignite's take on that > >>>> first > >>>>>> and > >>>>>>> foremost. Yet, cross-posting to [email protected] > >>>>>>> > >>>>>>> Jay et all: any thoughts about the combination? > >>>>>>> Cos > >>>>>>> > >>>>>>> On Wed, Jul 08, 2015 at 03:14PM, Roman Shaposhnik wrote: > >>>>>>>> I'm sure our RH brethren have something to say about Ceph. > >>>>>>>> Re-post on dev@bigtop? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Roman. > >>>>>>>> > >>>>>>>> On Wed, Jul 8, 2015 at 3:17 PM, Konstantin Boudnik < > >> [email protected] > >>>>> > >>>>>>> wrote: > >>>>>>>>> Guys, > >>>>>>>>> > >>>>>>>>> I was looking at the Hadoop accelerator the other day and > >> been > >>>>>>> thinking if > >>>>>>>>> anyone has tried to use IGFS on top of a real distributed > >> file > >>>>>>> storage. The > >>>>>>>>> case in point is Ceph (ceph.com) - a Linux file system > >> available > >>>>>> from > >>>>>>> any > >>>>>>>>> major Linux distribution as a kernel module. > >>>>>>>>> > >>>>>>>>> HDFS has its share in the world, but it isn't the fastest, > >>>> simplest, > >>>>>>> nor most > >>>>>>>>> advantageous distributed storage on the planet. Hence I am > >>>> wondering > >>>>>>> if this > >>>>>>>>> would be a good call to provide Ignite on CEPH as a 2nd FS > >>>>>>> capabilities. > >>>>>>>>> > >>>>>>>>> Thoughts? > >>>>>>>>> Cos > >>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > >
