Vladimir, The key difference between BLOB storage and IGFS is that BLOB storage will have persistent-based architecture with possibility to cache blocks in offheap (using mmap, which is more simple, because we delegate it to OS level) , while IGFS has in-memory based architecture with possibility to persist blocks. BLOB storage will have possibility to work with small amount of RAM without signficant performance drop (Using zero-copy from socket to disk) and in opposite case it can keep all available blocks in offheap if it's possible (Using mmap again). IGFS perform a lot of operations with blocks in on-heap which leads to unnecessary data copies, long GC pauses and performance drop. All IGFS architecture tightly bound with in-memory features, so it's too hard to rewrite IGFS in persistent-based manner. But, cool IGFS features such as intelligent affinity routing, chunk colocation will be reused in BLOB storage. Does it make sense?
2018-07-05 19:01 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > Pavel, > Design you described is almost precisely what IGFS does. It has a cache for > metadata, split binary data in chunks with intelligent affinity routing. In > addition we have map-reduce feature on top of it and integration with > underlying file system with optional caching. Data can be accessed in > blocks or streams. IGFS is not in active development, but it is not > outdated either. > Can you shortly explain why do you think that we need to drop IGFS and > re-implement almost the same thing from scratch? > > Dima, Sergey, > Yes, we need BLOB support you described. Unfortunately it is not that easy > to implement from SQL perspective. To support it we would need either MVCC > (with it's own drawbacks) or read-locks for SELECT. > > Vladimir. > > On Tue, Jul 3, 2018 at 10:40 AM Sergey Kozlov <skoz...@gridgain.com> > wrote: > > > Dmitriy > > > > You're right that that large objects storing should be optmized. > > > > Let's assume the large object means the regular object having large > fields > > and such fileds won't be used for comparison thus we can do not restore > the > > BLOB fields in offheap page memory e.g for sql queries if select doesn't > > include them explicitly. It can reduce page eviction and speed up the > > perfomance and make less chance to get OOM. > > > > > > > > On Tue, Jul 3, 2018 at 1:06 AM, Dmitriy Setrakyan <dsetrak...@apache.org > > > > wrote: > > > > > To be honest, I am not sure if we need to kick off another file system > > > storage discussion in Ignite. It sounds like a huge effort and likely > > will > > > not be productive. > > > > > > However, I think an ability to store large objects will make sense. For > > > example, how do I store a 10GB blob in Ignite cache? Most likely we > have > > to > > > have a separate memory or disk space, allocated for blobs only. We also > > > need to be able to efficiently transfer a 10GB Blob object over the > > network > > > and store it off-heap right away, without bringing it into main heap > > memory > > > (otherwise we would run out of memory). > > > > > > I suggest that we create an IEP about this use case alone and leave the > > > file system for the future discussions. > > > > > > D. > > > > > > On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov <voze...@gridgain.com> > > > wrote: > > > > > > > Pavel, > > > > > > > > Thank you. I'll wait for feature comparison and concrete use cases, > > > because > > > > for me this feature still sounds too abstract to judge whether > product > > > > would benefit from it. > > > > > > > > On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko <jokse...@gmail.com> > > > wrote: > > > > > > > > > Dmitriy, > > > > > > > > > > I think we have a little miscommunication here. Of course, I meant > > > > > supporting large entries / chunks of binary data. Internally it > will > > be > > > > > BLOB storage, which can be accessed through various interfaces. > > > > > "File" is just an abstraction for an end user for convenience, a > > > wrapper > > > > > layer to have user-friendly API to directly store BLOBs. We > shouldn't > > > > > support full file protocol support with file system capabilities. > It > > > can > > > > be > > > > > added later, but now it's absolutely unnecessary and introduces > extra > > > > > complexity. > > > > > > > > > > We can implement our BLOB storage step by step. The first thing is > > > > > core functionality and support to save large parts of binary > objects > > to > > > > it. > > > > > "File" layer, Web layer, etc. can be added later. > > > > > > > > > > The initial IGFS design doesn't have good capabilities to have a > > > > > persistence layer. I think we shouldn't do any changes to it, this > > > > project > > > > > as for me is almost outdated. We will drop IGFS after implementing > > File > > > > > System layer over our BLOB storage. > > > > > > > > > > Vladimir, > > > > > > > > > > I will prepare a comparison with other existing distributed file > > > storages > > > > > and file systems in a few days. > > > > > > > > > > About usage data grid, I never said, that we need transactions, > sync > > > > backup > > > > > and etc. We need just a few core things - Atomic cache with > > > persistence, > > > > > Discovery, Baseline, Affinity, and Communication. > > > > > Other things we can implement by ourselves. So this feature can > > develop > > > > > independently of other non-core features. > > > > > For me Ignite way is providing to our users a fast and convenient > way > > > to > > > > > solve their problems with good performance and durability. We have > > the > > > > > problem with storing large data, we should solve it. > > > > > About other things see my message to Dmitriy above. > > > > > > > > > > вс, 1 июл. 2018 г. в 9:48, Dmitriy Setrakyan < > dsetrak...@apache.org > > >: > > > > > > > > > > > Pavel, > > > > > > > > > > > > I have actually misunderstood the use case. To be honest, I > thought > > > > that > > > > > > you were talking about the support of large values in Ignite > > caches, > > > > e.g. > > > > > > objects that are several megabytes in cache. > > > > > > > > > > > > If we are tackling the distributed file system, then in my view, > we > > > > > should > > > > > > be talking about IGFS and adding persistence support to IGFS > (which > > > is > > > > > > based on HDFS API). It is not clear to me that you are talking > > about > > > > > IGFS. > > > > > > Can you confirm? > > > > > > > > > > > > D. > > > > > > > > > > > > > > > > > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko < > > > jokse...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Dmitriy, > > > > > > > > > > > > > > Yes, I have approximate design in my mind. The main idea is > that > > we > > > > > > already > > > > > > > have distributed cache for files metadata (our Atomic cache), > the > > > > data > > > > > > flow > > > > > > > and distribution will be controlled by our AffinityFunction and > > > > > Baseline. > > > > > > > We're already have discovery and communication to make such > local > > > > files > > > > > > > storages to be synced. The files data will be separated to > large > > > > blocks > > > > > > > (64-128Mb) (which looks very similar to our WAL). Each block > can > > > > > contain > > > > > > > one or more file chunks. The tablespace (segment ids, offsets > and > > > > etc.) > > > > > > > will be stored to our regular page memory. This is key ideas to > > > > > implement > > > > > > > first version of such storage. We already have similiar > > components > > > in > > > > > our > > > > > > > persistence, so this experience can be reused to develop such > > > > storage. > > > > > > > > > > > > > > Denis, > > > > > > > > > > > > > > Nothing significant should be changed at our memory level. It > > will > > > be > > > > > > > separate, pluggable component over cache. Most of the functions > > > which > > > > > > give > > > > > > > performance boost can be delegated to OS level (Memory mapped > > > files, > > > > > DMA, > > > > > > > Direct write from Socket to disk and vice versa). Ignite and > File > > > > > Storage > > > > > > > can develop independetly of each other. > > > > > > > > > > > > > > Alexey Stelmak, which has a great experience with developing > such > > > > > systems > > > > > > > can provide more low level information about how it should > look. > > > > > > > > > > > > > > сб, 30 июн. 2018 г. в 19:40, Dmitriy Setrakyan < > > > > dsetrak...@apache.org > > > > > >: > > > > > > > > > > > > > > > Pavel, it definitely makes sense. Do you have a design in > mind? > > > > > > > > > > > > > > > > D. > > > > > > > > > > > > > > > > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko < > > jokse...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > I would like to start a discussion about designing a new > > > feature > > > > > > > because > > > > > > > > I > > > > > > > > > think it's time to start making steps towards it. > > > > > > > > > I noticed, that some of our users have tried to store large > > > > > > homogenous > > > > > > > > > entries (> 1, 10, 100 Mb/Gb/Tb) to our caches, but without > > big > > > > > > success. > > > > > > > > > > > > > > > > > > IGFS project has the possibility to do it, but as for me it > > has > > > > one > > > > > > big > > > > > > > > > disadvantage - it's in-memory only, so users have a strict > > size > > > > > limit > > > > > > > of > > > > > > > > > their data and have data loss problem. > > > > > > > > > > > > > > > > > > Our durable memory has a possibility to persist a data that > > > > doesn't > > > > > > fit > > > > > > > > to > > > > > > > > > RAM to disk, but page structure of it is not supposed to > > store > > > > > large > > > > > > > > pieces > > > > > > > > > of data. > > > > > > > > > > > > > > > > > > There are a lot of projects of distributed file systems > like > > > > HDFS, > > > > > > > > > GlusterFS, etc. But all of them concentrate to implement > > > > high-grade > > > > > > > file > > > > > > > > > protocol, rather than user-friendly API which leads to high > > > entry > > > > > > > > threshold > > > > > > > > > to start implementing something over it. > > > > > > > > > We shouldn't go in this way. Our main goal should be > > providing > > > to > > > > > > user > > > > > > > > easy > > > > > > > > > and fast way to use file storage and processing here and > now. > > > > > > > > > > > > > > > > > > If take HDFS as closest possible by functionality project, > we > > > > have > > > > > > one > > > > > > > > big > > > > > > > > > advantage against it. We can use our caches as files > metadata > > > > > storage > > > > > > > and > > > > > > > > > have the infinite possibility to scale it, while HDFS is > > > bounded > > > > by > > > > > > > > > Namenode capacity and has big problems with keeping a large > > > > number > > > > > of > > > > > > > > files > > > > > > > > > in the system. > > > > > > > > > > > > > > > > > > We achieved very good experience with persistence when we > > > > developed > > > > > > our > > > > > > > > > durable memory, and we can couple together it and > experience > > > with > > > > > > > > services, > > > > > > > > > binary protocol, I/O and start to design a new IEP. > > > > > > > > > > > > > > > > > > Use cases and features of the project: > > > > > > > > > 1) Storing XML, JSON, BLOB, CLOB, images, videos, text, etc > > > > without > > > > > > > > > overhead and data loss possibility. > > > > > > > > > 2) Easy, pluggable, fast and distributed file processing, > > > > > > > transformation > > > > > > > > > and analysis. (E.g. ImageMagick processor for images > > > > > transformation, > > > > > > > > > LuceneIndex for texts, whatever, it's bounded only by your > > > > > > > imagination). > > > > > > > > > 3) Scalability out of the box. > > > > > > > > > 4) User-friendly API and minimal steps to start using this > > > > storage > > > > > in > > > > > > > > > production. > > > > > > > > > > > > > > > > > > I repeated again, this project is not supposed to be a > > > high-grade > > > > > > > > > distributed file system with full file protocol support. > > > > > > > > > This project should primarily focus on target users, which > > > would > > > > > like > > > > > > > to > > > > > > > > > use it without complex preparation. > > > > > > > > > > > > > > > > > > As for example, a user can deploy Ignite with such storage > > and > > > > > > > web-server > > > > > > > > > with REST API as Ignite service and get scalable, > performant > > > > image > > > > > > > server > > > > > > > > > out of the box which can be accessed using any programming > > > > > language. > > > > > > > > > > > > > > > > > > As a far target goal, we should focus on storing and > > > processing a > > > > > > very > > > > > > > > > large amount of the data like movies, streaming, which is > the > > > big > > > > > > trend > > > > > > > > > today. > > > > > > > > > > > > > > > > > > I would like to say special thanks to our community members > > > > Alexey > > > > > > > > Stelmak > > > > > > > > > and Dmitriy Govorukhin which significantly helped me to put > > > > > together > > > > > > > all > > > > > > > > > pieces of that puzzle. > > > > > > > > > > > > > > > > > > So, I want to hear your opinions about this proposal. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Sergey Kozlov > > GridGain Systems > > www.gridgain.com > > >