Great points, Olaf! :)

On Tue, Jul 14, 2015 at 12:37 AM, Olaf Flebbe <[email protected]> wrote:

> Hi everyone,
>
> nice to know someyone with scientific computing background  around, too.
>
> Let me phrase a bit different what "native" to LInux mean to me: The
> filesystem's API is the open(2), close(2) systemcalls, rather a Java API.
> FUSE mounted filesystems are generally slooooow because of the additional
> switch from kernel to userland needed for each system call. And let me
> stress that POSIX is only a small subset of what a filesystem has to do ..
> the semantics of acls, mmap(),  symlinks, truncate(), locking and sparse
> files are way beyond POSIX or any standardization. It is funny that I now
> have seen codes, where developer has problems porting to UNIX systems,
> since it uses special Linux semantics. Of course a cluster filesystem has
> to break usual semantics in some way in order to achive its performance.
>
> HPC with Linux Clusters is one cornerstone of the business of s+c and we
> use cluster filesystems a lot. One special point to add: There are cluster
> filesystems which can exploit low latency networks like infiniband with
> RDMA rather TCP/IP and zero-copy which is beneficial performancewise.
> Unfortunatly there is no RDMA support in hadoop as far as I know.
>
> Someone gave me a hint about lustre integration of hadoop lately. I still
> have to look into it....
>
> To use Spark or whatever directly on Ceph rather HDFS for instance could
> be beneficial for scenarios like openstack where there is no notion of
> local disk -- aside from using "ironic" .
>
> Olaf
>
>
>
>
> > Am 14.07.2015 um 06:23 schrieb RJ Nowling <[email protected]>:
> >
> > Thanks, Cos!
> >
> >> from Ignite standpoint replacing one with another doesn't give much
> > advantage
> >
> > Agreed.  From the standpoint of Ignite, Hadoop, or Spark, Gluster works
> no
> > differently than HDFS.  If Ignite doesn't have an object store available
> > already, then Ceph could add that capability.
> >
> > From the standpoint of the user and integration with a larger IT
> > infrastructure, Gluster offers advantages over HDFS.  As you say, Gluster
> > is a POSIX-compatible native filesystem -- it provides a FUSE module for
> > mounting remote Gluster volumes.  This means non-Hadoop applications can
> > store data in the same file system as Hadoop.
> >
> > I come from a scientific computing background where pretty much every
> > simulation or analysis tool expected access to a POSIX file system.  We
> > evaluated Hadoop at one point but chose not to use it because we would
> have
> > to copy all of our data into HDFS.  Gluster is a much better POSIX
> > distributed file system than what my university's cluster used, and I
> wish
> > I had known about it while doing my Ph.D.  :)
> >
> > For my work at Red Hat, we run Spark on Gluster.  We don't use any
> special
> > plugins -- since Spark uses the Hadoop file system libraries, Spark can
> > read off native file systems.  Same advantages mentioned above -- nice to
> > be able to use grep, cat, etc. alongside Spark :)
> >
> >
> > On Mon, Jul 13, 2015 at 8:55 PM, Konstantin Boudnik <[email protected]>
> wrote:
> >
> >> On Mon, Jul 13, 2015 at 07:00PM, RJ Nowling wrote:
> >>> Cos,
> >>>
> >>> Can you expand on what you mean by "native to Linux" for Ceph?
> >>
> >> I meant that the file system is presented in a Linux distro as kernel
> >> module.
> >> HDFS, as you know, is an alien Java process that creates a layer
> >> indirection
> >> on top of say ext4 or jfs to provide a distributed storage; Ceph does
> this
> >> similarly to other _native_ file systems.
> >>
> >>> And can you elaborate on why Gluster doesn't make sense as a HDFS
> >>> replacement to you?
> >>
> >> What I wanted to express, perhaps a bit clumsy, is that HDFS and Gluster
> >> are
> >> two instances of HCFS. from Ignite standpoint replacing one with another
> >> doesn't give much advantage (unless I am missing something about the
> >> Gluster).
> >> Hopefully it makes sense?
> >>
> >>> Not trying to argue -- just generally curious.  :)
> >>
> >> Not trying to cast a shadow on Gluster nor whitewash HDFS (far from it)
> ;)
> >>
> >> Cos
> >>
> >>> Thanks!
> >>>
> >>> On Mon, Jul 13, 2015 at 5:06 PM, Konstantin Boudnik <[email protected]>
> >> wrote:
> >>>
> >>>> I think file system is more universally used. However, one can build
> >> an FS
> >>>> on
> >>>> top of a good object storage - just need to provide some metadata
> >>>> abstraction/concept.
> >>>>
> >>>> Replacing HDFS w/ Gluster doesn't make much sense to me (if ever be
> >>>> considered). What I like about Ceph is that it is native to Linux,
> >> unlike
> >>>> all
> >>>> other artificial HCFS contraptions. Hence my initial question.
> >>>>
> >>>> Cos
> >>>>
> >>>> On Thu, Jul 09, 2015 at 01:53AM, Dmitriy Setrakyan wrote:
> >>>>> Hm... I would think that file system would be more beneficial,
> >> although
> >>>>> object store on disk can also be valuable.
> >>>>>
> >>>>> Cos, what is your thinking?
> >>>>>
> >>>>> D.
> >>>>>
> >>>>> On Wed, Jul 8, 2015 at 8:40 PM, RJ Nowling <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> Ceph makes a better object store while Gluster makes a better file
> >>>> system.
> >>>>>>  That's why Ceph is a popular backend for OpenStack Swift.
> >>>>>>
> >>>>>> Does Ignite want a FS or Object backend?
> >>>>>>
> >>>>>> On Wed, Jul 8, 2015 at 5:57 PM, Konstantin Boudnik <[email protected]
> >>>
> >>>> wrote:
> >>>>>>
> >>>>>>> Good point... although I was curious about Ignite's take on that
> >>>> first
> >>>>>> and
> >>>>>>> foremost. Yet, cross-posting to [email protected]
> >>>>>>>
> >>>>>>> Jay et all: any thoughts about the combination?
> >>>>>>>  Cos
> >>>>>>>
> >>>>>>> On Wed, Jul 08, 2015 at 03:14PM, Roman Shaposhnik wrote:
> >>>>>>>> I'm sure our RH brethren have something to say about Ceph.
> >>>>>>>> Re-post on dev@bigtop?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Roman.
> >>>>>>>>
> >>>>>>>> On Wed, Jul 8, 2015 at 3:17 PM, Konstantin Boudnik <
> >> [email protected]
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>> Guys,
> >>>>>>>>>
> >>>>>>>>> I was looking at the Hadoop accelerator the other day and
> >> been
> >>>>>>> thinking if
> >>>>>>>>> anyone has tried to use IGFS on top of a real distributed
> >> file
> >>>>>>> storage. The
> >>>>>>>>> case in point is Ceph (ceph.com) - a Linux file system
> >> available
> >>>>>> from
> >>>>>>> any
> >>>>>>>>> major Linux distribution as a kernel module.
> >>>>>>>>>
> >>>>>>>>> HDFS has its share in the world, but it isn't the fastest,
> >>>> simplest,
> >>>>>>> nor most
> >>>>>>>>> advantageous distributed storage on the planet. Hence I am
> >>>> wondering
> >>>>>>> if this
> >>>>>>>>> would be a good call to provide Ignite on CEPH as a 2nd FS
> >>>>>>> capabilities.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>  Cos
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
>
>

Reply via email to