Re: Ignite as distributed file storage

Pavel Kovalenko Thu, 05 Jul 2018 12:44:13 -0700

Vladimir,

The key difference between BLOB storage and IGFS is that BLOB storage will
have persistent-based architecture with possibility to cache blocks in
offheap (using mmap, which is more simple, because we delegate it to OS
level)
, while IGFS has in-memory based architecture with possibility to persist
blocks.
BLOB storage will have possibility to work with small amount of RAM without
signficant performance drop (Using zero-copy from socket to disk) and in
opposite case it can keep all available blocks in offheap if it's possible
(Using mmap again).
IGFS perform a lot of operations with blocks in on-heap which leads to
unnecessary data copies, long GC pauses and performance drop. All IGFS
architecture tightly bound with in-memory features, so it's too hard to
rewrite IGFS in persistent-based manner. But, cool IGFS features such as
intelligent affinity routing, chunk colocation will be reused in BLOB
storage.
Does it make sense?




2018-07-05 19:01 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Pavel,
> Design you described is almost precisely what IGFS does. It has a cache for
> metadata, split binary data in chunks with intelligent affinity routing. In
> addition we have map-reduce feature on top of it and integration with
> underlying file system with optional caching. Data can be accessed in
> blocks or streams. IGFS is not in active development, but it is not
> outdated either.
> Can you shortly explain why do you think that we need to drop IGFS and
> re-implement almost the same thing from scratch?
>
> Dima, Sergey,
> Yes, we need BLOB support you described. Unfortunately it is not that easy
> to implement from SQL perspective. To support it we would need either MVCC
> (with it's own drawbacks) or read-locks for SELECT.
>
> Vladimir.
>
> On Tue, Jul 3, 2018 at 10:40 AM Sergey Kozlov <skoz...@gridgain.com>
> wrote:
>
> > Dmitriy
> >
> > You're right that that large objects storing should be optmized.
> >
> > Let's assume the large object means the regular object having large
> fields
> > and such fileds won't be used for comparison thus we can do not restore
> the
> > BLOB fields in offheap page memory e.g for sql queries if select doesn't
> > include them explicitly. It can reduce page eviction and speed up the
> > perfomance and make less chance to get OOM.
> >
> >
> >
> > On Tue, Jul 3, 2018 at 1:06 AM, Dmitriy Setrakyan <dsetrak...@apache.org
> >
> > wrote:
> >
> > > To be honest, I am not sure if we need to kick off another file system
> > > storage discussion in Ignite. It sounds like a huge effort and likely
> > will
> > > not be productive.
> > >
> > > However, I think an ability to store large objects will make sense. For
> > > example, how do I store a 10GB blob in Ignite cache? Most likely we
> have
> > to
> > > have a separate memory or disk space, allocated for blobs only. We also
> > > need to be able to efficiently transfer a 10GB Blob object over the
> > network
> > > and store it off-heap right away, without bringing it into main heap
> > memory
> > > (otherwise we would run out of memory).
> > >
> > > I suggest that we create an IEP about this use case alone and leave the
> > > file system for the future discussions.
> > >
> > > D.
> > >
> > > On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov <voze...@gridgain.com>
> > > wrote:
> > >
> > > > Pavel,
> > > >
> > > > Thank you. I'll wait for feature comparison and concrete use cases,
> > > because
> > > > for me this feature still sounds too abstract to judge whether
> product
> > > > would benefit from it.
> > > >
> > > > On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko <jokse...@gmail.com>
> > > wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > I think we have a little miscommunication here. Of course, I meant
> > > > > supporting large entries / chunks of binary data. Internally it
> will
> > be
> > > > > BLOB storage, which can be accessed through various interfaces.
> > > > > "File" is just an abstraction for an end user for convenience, a
> > > wrapper
> > > > > layer to have user-friendly API to directly store BLOBs. We
> shouldn't
> > > > > support full file protocol support with file system capabilities.
> It
> > > can
> > > > be
> > > > > added later, but now it's absolutely unnecessary and introduces
> extra
> > > > > complexity.
> > > > >
> > > > > We can implement our BLOB storage step by step. The first thing is
> > > > > core functionality and support to save large parts of binary
> objects
> > to
> > > > it.
> > > > > "File" layer, Web layer, etc. can be added later.
> > > > >
> > > > > The initial IGFS design doesn't have good capabilities to have a
> > > > > persistence layer. I think we shouldn't do any changes to it, this
> > > > project
> > > > > as for me is almost outdated. We will drop IGFS after implementing
> > File
> > > > > System layer over our BLOB storage.
> > > > >
> > > > > Vladimir,
> > > > >
> > > > > I will prepare a comparison with other existing distributed file
> > > storages
> > > > > and file systems in a few days.
> > > > >
> > > > > About usage data grid, I never said, that we need transactions,
> sync
> > > > backup
> > > > > and etc. We need just a few core things - Atomic cache with
> > > persistence,
> > > > > Discovery, Baseline, Affinity, and Communication.
> > > > > Other things we can implement by ourselves. So this feature can
> > develop
> > > > > independently of other non-core features.
> > > > > For me Ignite way is providing to our users a fast and convenient
> way
> > > to
> > > > > solve their problems with good performance and durability. We have
> > the
> > > > > problem with storing large data, we should solve it.
> > > > > About other things see my message to Dmitriy above.
> > > > >
> > > > > вс, 1 июл. 2018 г. в 9:48, Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >:
> > > > >
> > > > > > Pavel,
> > > > > >
> > > > > > I have actually misunderstood the use case. To be honest, I
> thought
> > > > that
> > > > > > you were talking about the support of large values in Ignite
> > caches,
> > > > e.g.
> > > > > > objects that are several megabytes in cache.
> > > > > >
> > > > > > If we are tackling the distributed file system, then in my view,
> we
> > > > > should
> > > > > > be talking about IGFS and adding persistence support to IGFS
> (which
> > > is
> > > > > > based on HDFS API). It is not clear to me that you are talking
> > about
> > > > > IGFS.
> > > > > > Can you confirm?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > >
> > > > > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko <
> > > jokse...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Dmitriy,
> > > > > > >
> > > > > > > Yes, I have approximate design in my mind. The main idea is
> that
> > we
> > > > > > already
> > > > > > > have distributed cache for files metadata (our Atomic cache),
> the
> > > > data
> > > > > > flow
> > > > > > > and distribution will be controlled by our AffinityFunction and
> > > > > Baseline.
> > > > > > > We're already have discovery and communication to make such
> local
> > > > files
> > > > > > > storages to be synced. The files data will be separated to
> large
> > > > blocks
> > > > > > > (64-128Mb) (which looks very similar to our WAL). Each block
> can
> > > > > contain
> > > > > > > one or more file chunks. The tablespace (segment ids, offsets
> and
> > > > etc.)
> > > > > > > will be stored to our regular page memory. This is key ideas to
> > > > > implement
> > > > > > > first version of such storage. We already have similiar
> > components
> > > in
> > > > > our
> > > > > > > persistence, so this experience can be reused to develop such
> > > > storage.
> > > > > > >
> > > > > > > Denis,
> > > > > > >
> > > > > > > Nothing significant should be changed at our memory level. It
> > will
> > > be
> > > > > > > separate, pluggable component over cache. Most of the functions
> > > which
> > > > > > give
> > > > > > > performance boost can be delegated to OS level (Memory mapped
> > > files,
> > > > > DMA,
> > > > > > > Direct write from Socket to disk and vice versa). Ignite and
> File
> > > > > Storage
> > > > > > > can develop independetly of each other.
> > > > > > >
> > > > > > > Alexey Stelmak, which has a great experience with developing
> such
> > > > > systems
> > > > > > > can provide more low level information about how it should
> look.
> > > > > > >
> > > > > > > сб, 30 июн. 2018 г. в 19:40, Dmitriy Setrakyan <
> > > > dsetrak...@apache.org
> > > > > >:
> > > > > > >
> > > > > > > > Pavel, it definitely makes sense. Do you have a design in
> mind?
> > > > > > > >
> > > > > > > > D.
> > > > > > > >
> > > > > > > > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko <
> > jokse...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > I would like to start a discussion about designing a new
> > > feature
> > > > > > > because
> > > > > > > > I
> > > > > > > > > think it's time to start making steps towards it.
> > > > > > > > > I noticed, that some of our users have tried to store large
> > > > > > homogenous
> > > > > > > > > entries (> 1, 10, 100 Mb/Gb/Tb) to our caches, but without
> > big
> > > > > > success.
> > > > > > > > >
> > > > > > > > > IGFS project has the possibility to do it, but as for me it
> > has
> > > > one
> > > > > > big
> > > > > > > > > disadvantage - it's in-memory only, so users have a strict
> > size
> > > > > limit
> > > > > > > of
> > > > > > > > > their data and have data loss problem.
> > > > > > > > >
> > > > > > > > > Our durable memory has a possibility to persist a data that
> > > > doesn't
> > > > > > fit
> > > > > > > > to
> > > > > > > > > RAM to disk, but page structure of it is not supposed to
> > store
> > > > > large
> > > > > > > > pieces
> > > > > > > > > of data.
> > > > > > > > >
> > > > > > > > > There are a lot of projects of distributed file systems
> like
> > > > HDFS,
> > > > > > > > > GlusterFS, etc. But all of them concentrate to implement
> > > > high-grade
> > > > > > > file
> > > > > > > > > protocol, rather than user-friendly API which leads to high
> > > entry
> > > > > > > > threshold
> > > > > > > > > to start implementing something over it.
> > > > > > > > > We shouldn't go in this way. Our main goal should be
> > providing
> > > to
> > > > > > user
> > > > > > > > easy
> > > > > > > > > and fast way to use file storage and processing here and
> now.
> > > > > > > > >
> > > > > > > > > If take HDFS as closest possible by functionality project,
> we
> > > > have
> > > > > > one
> > > > > > > > big
> > > > > > > > > advantage against it. We can use our caches as files
> metadata
> > > > > storage
> > > > > > > and
> > > > > > > > > have the infinite possibility to scale it, while HDFS is
> > > bounded
> > > > by
> > > > > > > > > Namenode capacity and has big problems with keeping a large
> > > > number
> > > > > of
> > > > > > > > files
> > > > > > > > > in the system.
> > > > > > > > >
> > > > > > > > > We achieved very good experience with persistence when we
> > > > developed
> > > > > > our
> > > > > > > > > durable memory, and we can couple together it and
> experience
> > > with
> > > > > > > > services,
> > > > > > > > > binary protocol, I/O and start to design a new IEP.
> > > > > > > > >
> > > > > > > > > Use cases and features of the project:
> > > > > > > > > 1) Storing XML, JSON, BLOB, CLOB, images, videos, text, etc
> > > > without
> > > > > > > > > overhead and data loss possibility.
> > > > > > > > > 2) Easy, pluggable, fast and distributed file processing,
> > > > > > > transformation
> > > > > > > > > and analysis. (E.g. ImageMagick processor for images
> > > > > transformation,
> > > > > > > > > LuceneIndex for texts, whatever, it's bounded only by your
> > > > > > > imagination).
> > > > > > > > > 3) Scalability out of the box.
> > > > > > > > > 4) User-friendly API and minimal steps to start using this
> > > > storage
> > > > > in
> > > > > > > > > production.
> > > > > > > > >
> > > > > > > > > I repeated again, this project is not supposed to be a
> > > high-grade
> > > > > > > > > distributed file system with full file protocol support.
> > > > > > > > > This project should primarily focus on target users, which
> > > would
> > > > > like
> > > > > > > to
> > > > > > > > > use it without complex preparation.
> > > > > > > > >
> > > > > > > > > As for example, a user can deploy Ignite with such storage
> > and
> > > > > > > web-server
> > > > > > > > > with REST API as Ignite service and get scalable,
> performant
> > > > image
> > > > > > > server
> > > > > > > > > out of the box which can be accessed using any programming
> > > > > language.
> > > > > > > > >
> > > > > > > > > As a far target goal, we should focus on storing and
> > > processing a
> > > > > > very
> > > > > > > > > large amount of the data like movies, streaming, which is
> the
> > > big
> > > > > > trend
> > > > > > > > > today.
> > > > > > > > >
> > > > > > > > > I would like to say special thanks to our community members
> > > > Alexey
> > > > > > > > Stelmak
> > > > > > > > > and Dmitriy Govorukhin which significantly helped me to put
> > > > > together
> > > > > > > all
> > > > > > > > > pieces of that puzzle.
> > > > > > > > >
> > > > > > > > > So, I want to hear your opinions about this proposal.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>

Re: Ignite as distributed file storage

Reply via email to