Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Ivan Pavlukhin Sat, 31 Jul 2021 00:24:37 -0700

Hi Courtney,

Please clarify what do you mean by prepared queries and query caching?
Do you mean caching query results? If so, in my mind material views
are the best approach here (Ignite 2 does not support them). Do you
have other good approaches in your mind? E.g. implemented in other
databases.


2021-07-26 21:27 GMT+03:00, Valentin Kulichenko <[email protected]>:
> Hi Courtney,
>
> Generally speaking, query caching certainly makes sense. As far as I know,
> Ignite 2.x actually does that, but most likely there might be room for
> improvement as well. We will look into this.
>
> As for the SQL API - the answer is yes. The requirement for a dummy cache
> is an artifact of the current architecture. This is 100% wrong and will be
> changed in 3.0.
>
> -Val
>
> On Sun, Jul 25, 2021 at 2:51 PM Courtney Robinson
> <[email protected]>
> wrote:
>
>> Something else came to mind, are there plans to support prepared queries?
>>
>> I recall someone saying before that Ignite does internally cache queries
>> but it's not at all clear if or how it does do that. I assume a simple
>> hash
>> of the query isn't enough.
>>
>> We generate SQL queries based on user runtime settings and they can get
>> to
>> hundreds of lines long, I imagine this means most of our queries are not
>> being cached but there are patterns so we could generate and manage
>> prepared queries ourselves.
>>
>> Also, will there be a dedicated API for doing SQL queries rather than
>> having to pass a SqlFieldsQuery to a cache that has nothing to do with
>> the
>> cache being queried? When I first started with Ignite years ago, this was
>> beyond confusing for me. I'm trying to run select x from B but I pass
>> this
>> to a cache called DUMMY or whatever arbitrary name...
>>
>> On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson <
>> [email protected]>
>> wrote:
>>
>> > Andrey,
>> > Thanks for the response - see my comments inline.
>> >
>> >
>> >> I've gone through the questions and have no the whole picture of your
>> use
>> >> case.
>> >
>> > Would you please clarify how you exactly use the Ignite? what are the
>> >> integration points?
>> >>
>> >
>> > I'll try to clarify - we have a low/no code platform. A user designs a
>> > model for their application and we map this model to Ignite tables and
>> > other data sources. The model I'll describe is what we're building now
>> and
>> > expected to be in alpha some time in Q4 21. Our current production
>> > architecture is different and isn't as generic, it is heavily tied to
>> > Ignite and we've redesigned to get some flexibility where Ignite
>> > doesn't
>> > provide what we want. Things like window functions and other SQL-99
>> limits.
>> >
>> > In the next gen version we're working on you can create a model for a
>> > Tweet(content, to) and we will create an Ignite table with content and
>> > to
>> > columns using the type the user selects. This is the simplest case.
>> > We are adding generic support for sources and sinks and using Calcite
>> > as
>> a
>> > data virtualisation layer. Ignite is one of the available source/sinks.
>> >
>> > When a user creates a model for Tweet, we also allow them to specify
>> > how
>> > they want to index the data. We have a copy of the calcite
>> > Elasticsearch
>> > adapter modified for Solr.
>> >
>> > When a source is queried (Ignite or any other that we support), we
>> > generate SQL that Calcite executes. Calcite will push down the
>> > generated
>> > queries to Solr and Solr produces a list of IDs (in case of Ignite) and
>> we
>> > do a multi-get from Ignite to produce the actual results.
>> >
>> > Obviously there's a lot more to this but that should give you a general
>> > idea.
>> >
>> > and maybe share some experience with using Ignite SPIs?
>> >>
>> > Our evolution with Ignite started from the key value + compute APIs. We
>> > used the SPIs then but have since moved to using only the Ignite SQL
>> > API
>> > (we gave up transactions for this).
>> >
>> > We originally used the indexing SPI to keep our own lucene index of
>> > data
>> > in a cache. We did not use the Ignite FTS as it is very limited
>> > compared
>> to
>> > what we allow customers to do. If I remember correctly, we were using
>> > an
>> > affinity compute job to send queries to the right Ignite node and
>> > then doing a multi-get to pull the data from caches.
>> > I think we used one or two other SPIs and we found them very useful to
>> > be
>> > able to extend and customise Ignite without having to fork/change
>> upstream
>> > classes. We only stopped using them because we eventually concluded
>> > that
>> > using the SQL only API was better for numerous reasons.
>> >
>> >
>> >> We'll keep the information in mind while developing the Ignite,
>> >> because this may help us to make a better product.
>> >>
>> >> By the way, I'll try to answer the questions.
>> >>
>> >> >   1. Schema change - does that include the ability to change the
>> >> > types
>> >> of
>> >> >   fields/columns?
>> >> Yes, we plan to support transparent conversion to a wider type on-fly
>> >> (e.g.
>> >> 'int' to 'long').
>> >> This is a major point of our Live-schema concept.
>> >> In fact, there is no need to convert data on all the nodes in a
>> >> synchronous
>> >> way as old SQL databases do (if one supports though),
>> >> we are going to support multiple schema versions and convert data
>> >> on-demand
>> >> on a per-row basis to the latest version,
>> >> then write-back the row.
>> >>
>> >
>> > I can understand. The auto conversion to wider type makes sense.
>> >
>> >>
>> >> More complex things like 'String' -> 'int' are out of scope for now
>> >> because
>> >> it requires the execution of a user code on the critical path.
>> >>
>> >
>> > I would argue though that executing user code on the critical path
>> > shouldn't be a blocker for custom conversions. I feel if a user is
>> > making
>> > an advance enough integration to provide custom conversions they would
>> > be
>> > aware that it impacts the system as a whole.
>> >
>> > The limitation here is column MUST NOT be indexed, because an index
>> > over
>> >> the data of different kinds is impossible.
>> >>
>> >  Understood - I'd make the case that indexing should be pluggable. I
>> would
>> > love for us to be able to take indexing away from Ignite in our impl. -
>> > I
>> > think in Calcite, the Postgres adapter does this by having a table
>> > whose
>> > type is "Index". The implementor would be left with the freedom to
>> > choose
>> > how that table answers index lookups. From Ignite's perspective it
>> wouldn't
>> > care so long as the interface's contract is met, I could use an index
>> that
>> > does a lucene, ES, Solr or Redis lookup and the end result would be the
>> > same but as the implementor I'm choosing the tradeoff I want to meet
>> > the
>> > organisation's goals.
>> >
>> >
>> >>
>> >>  >  2. Will the new guaranteed consistency between APIs also mean SQL
>> will
>> >>  >  gain transaction support?
>> >> Yes, we plan to have Transactional SQL.
>> >> DDL will be non-transactional though, and I wonder if the one supports
>> >> this.
>> >>
>> > I'm not sure I know of any thing that supports transactional DDL so
>> > don't
>> > think this is an issue but I would say that a DDL statement in a
>> > transaction shouldn't fail the transaction. I believe in Ignite 2 there
>> is
>> > a flag  to turn this on or off, we should definitely keep this. In our
>> > case, it's an issue with the nature of the platform we provide, at
>> > development time only about 10% of schema or other DB info is known -
>> > we
>> > generate the other 90% on the fly based on whatever customers decide to
>> > design from our UI.
>> >
>> >>
>> >> Ignite 3 will operate with Rows underneath, but classic Table API and
>> >> Key-value will be available to a user
>> >> at the same time and with all consistency guarantees.
>> >
>> > Excellent!
>> >
>> >>
>> >>
>> >
>> >> >  3. Has there been any decision about how much of Calcite will be
>> >> exposed
>> >> >   to the client? When using thick clients, it'll be hugely
>> >> > beneficial
>> to
>> >> be
>> >> >   able to work with Calcite APIs directly to provide custom rules
>> >> > and
>> >> >  optimizations to better suit organization needs
>> >> As of now, we have no plans to expose any Calcite API to a user.
>> >> AFAIK, we have our custom Calcite convention, custom rules that are
>> aware
>> >> of distributed environment,
>> >> and additional AST nodes. The rules MUST correctly propagate internal
>> >> information about data distribution,
>> >> so I'm not sure want to give low-level access to them.
>> >>
>> >
>> > Maybe we're an edge case but for us access to the Calcite APIs would be
>> > shift our current development somewhat. For us, we're treating Ignite
>> > as
>> a
>> > library that provides a good foundation and we extend and customise it.
>> > Again, we may be an edge case and maybe most people just want a
>> > database
>> to
>> > put data into and get it back out without controlling some of how it
>> > does
>> > those things.
>> >
>> >
>> >> > We Index into Solr and use the Solr indices
>> >> Ignite 1-2 has poor support for TEXT queries, which is totally
>> >> unconfigurable.
>> >> Also, Lucene indices underneath are NOT persistent that requires too
>> much
>> >> effort to fix it.
>> >> GeoSpatial index has the same issues, we decided to drop them along
>> >> with
>> >> Indexing SPI at all.
>> >>
>> >> However, you can find the activity on dev-list on the Index Query
>> >> topic.
>> >> Guys are going to add IndexQuery (a scan query over the sorted index
>> which
>> >> can use simple conditions) in Ignite 2.
>> >> We also plan to have the same functionality, maybe it is possible to
>> >> add
>> >> full-text search support here.
>> >> Will it work for you, what do you think?
>> >>
>> > Yes, we originally looked at text queries and almost immediately said
>> > no.
>> > Nothing about it was useful for us other than the lucene dependency in
>> > Java. In the end that also became an issue because we wanted a newer
>> lucene
>> > version.
>> > IndexQuery will be useful - we'll certainly use it but it's not enough.
>> > I
>> > think we customise and depend on Solr too much for IndexQuery to
>> > compare
>> > but it will help in some cases for simpler queries.
>> >
>> >>
>> >>
>> >> >    4. Will the unified storage model enable different versions of
>> Ignite
>> >> to
>> >> >   be in the cluster when persistence is enabled so that rolling
>> restarts
>> >> can
>> >> >   be done?
>> >> I'm not sure a rolling upgrade (RU) will be available because too much
>> >> compatibility issues should be resolved
>> >> to make RU possible under the load without downtime.
>> >>
>> >> Maybe it makes sense to provide some grid mode (maintenance mode) for
>> >> RU
>> >> purposes that will block all the user load
>> >> but allow upgrade the grid. E.g. for the pure in-memory case.
>> >>
>> >> Persistence compatibility should be preserved as it works for Ignite
>> >> 2.
>> >>
>> > My ideal situation would be that we start a newer Ignite version, it
>> comes
>> > online, joins the cluster and is treated as some kind of maintenance
>> > mode
>> > as you suggested. In maintenance mode, the other nodes re-balance or
>> > some
>> > other process to send all the data this new node will handle over to
>> > it.
>> > The existing nodes continue serving this data until the new node is no
>> > longer in maintenance mode and then it becomes the primary for the data
>> > that was rebalanced to it.
>> >
>> > The second case is if an existing node is restarted with a newer Ignite
>> > version. No re-balance is needed, it joins in maintenance mode, runs
>> > any
>> > upgrade/conversion or other task it needs to and then starts accepting
>> > reads and writes. Communication with lower version nodes can be
>> > limited,
>> > they are aware of it and sends it data and queries for which it is the
>> > primary assuming they will also be upgraded.
>> >
>> > I guess I'm not aware of the compatibility issues this presents and so
>> > my
>> > view is narrow and perhaps naive here.
>> >
>> >>
>> >>
>> >> >    5. Will it be possible to provide a custom cache store still and
>> will
>> >> >   these changes enable custom cache stores to be queryable from SQL?
>> >> I'm not sure I fully understand this.
>> >> 1. Usually, SQL is about indices. Ignite can't perform a query over
>> >> the
>> >> unindexed data.
>> >>
>> > Yes understood
>> >
>> >>
>> >> 2. Fullscan over the cache that contains only part of data + scan the
>> >> CacheStore, then merging the results is a pain.
>> >> Most likely, running a query over CacheStore directly will be a
>> >> simpler
>> >> way, and even more performant.
>> >> Shared CacheStore (same for all nodes) will definitely kill the
>> >> performance
>> >> in that case.
>> >> So, the preliminary loadCache() call looks like a good compromise.
>> >>
>> > I think the problem is largely that the CacheStore interface is not
>> > sufficient for being able to do this. If it had a richer interface
>> > which
>> > allowed the cache store to answer index queries basically hooking into
>> > whatever Ignite's doing for its B+tree then this would be viable. A
>> > CacheStore that only implements KV API doesn't take part in SQL
>> > queries.
>> >
>> >>
>> >> 3. Splitting query into 2 parts to run on Ignite and to run on
>> CacheStore
>> >> looks possible with Calcite,
>> >> but I think it impractical because in general, neither CacheStore nor
>> >> database structure are aware of the data partitioning.
>> >>
>> > hmmm, maybe I missed the point but as the implementor of the CacheStore
>> > you should have knowledge of the structure and partition info. or have
>> some
>> > way of retrieving it. Again, I think the current CacheStore interface
>> > is
>> > the problem and if it was extended to provide this information then its
>> up
>> > to the implementation to do this whilst Ignite knows that any
>> > implementation of these interfaces will meet the contract necessary.
>> >
>> >
>> >>
>> >> 4. Transactions can't be supported in case of direct CacheStore
>> >> access,
>> >> because even if the underlying database supports 2-phase commit, which
>> is
>> >> a
>> >> rare case, the recovery protocol looks hard.
>> >> Just looks like this feature doesn't worth it.
>> >>
>> > I'd completely agree with this. It will be incredibly hard to get this
>> > done reliably
>> >
>> >>
>> >>
>> >> >   6. This question wasn't mine but I was going to ask it as well:
>> >> > What
>> >> >   will happen to the Indexing API since H2 is being removed?
>> >> As I wrote above, Indexing SPI will be dropped, but IndexQuery will be
>> >> added.
>> >>
>> >> >  1. As I mentioned above, we Index into Solr, in earlier versions of
>> >> >      our product we used the indexing SPI to index into Lucene on
>> >> > the
>> >> Ignite
>> >> >      nodes but this presented so many challenges we ultimately
>> abandoned
>> >> it and
>> >> >      replaced it with the current Solr solution.
>> >> AFAIK, some guys developed and sell a plugin for Ignite-2 with
>> persistent
>> >> Lucene and Geo indices.
>> >> I don't know about the capabilities and limitations of their solution,
>> >> because of closed code.
>> >> You can easily google it.
>> >>
>> >> I saw few encouraged guys who want to improve TEXT queries,
>> >> but unfortunately, things weren't moved far enough. For now, they are
>> >> in
>> >> the middle of fixing the merging TEXT query results.
>> >> So far so good.
>> >>
>> >> I think it is a good chance to master the skill developing of a
>> >> distributed
>> >> system for the one
>> >> who will take a lead over the full-text search feature and add native
>> >> FullText index support into Ignite-3.
>> >>
>> > I've seen the other thread from Atri I believe about this.
>> >
>> >>
>> >>
>> >> >   7. What impact does RAFT now have on conflict resolution?
>> >> RAFT is a state machine replication protocol. It guarantees all the
>> nodes
>> >> will see the updates in the same order.
>> >> So, seems no conflicts are possible. Recovery from split-brain is
>> >> impossible in common-case.
>> >>
>> >> However, I think we have a conflict resolver analog in Ignite-3 as it
>> >> is
>> >> very useful in some cases
>> >> e.g datacenter replication, incremental data load from 3-rd party
>> source,
>> >> recovery from 3-rd party source.
>> >>
>> >>
>> >> > 8. CacheGroups.
>> >> AFAIK, CacheGroup will be eliminated, actually, we'll keep this
>> mechanic,
>> >> but it will be configured in a different way,
>> >> which makes Ignite configuring a bit simpler.
>> >> Sorry, for now, I have no answer on your performance concerns, this
>> >> part
>> >> of
>> >> Ignite-3 slipped from my radar.
>> >>
>> > No worries. I'll wait and see if anyone else suggests something. Its
>> > getting a lot worse, a node took 1hr to start yesterday after a
>> deployment
>> > and its in prod with very little visibility into what it is doing, it
>> > was
>> > just stopped, no logging or anything and then resumed.
>> >
>> > 2021-07-22 13:40:15.997  INFO [ArcOS,,,] 9 --- [orker-#40%hypi%]
>> > o.a.i.i.p.cache.GridCacheProcessor      [285] :  Finished recovery for
>> > cache [cache=hypi_01F8ZC3DGT66RNYCDZH3XNVY2E_Hue, grp=hypi,
>> > startVer=AffinityTopologyVersion [topVer=79, minorTopVer=0]]
>> >
>> > One hour later it printed the next cache recovery message and started
>> > 30
>> > seconds after going through other tables.
>> >
>> >
>> >
>> >>
>> >> Let's wait if someone will clarify what we could expect in Ignite-3.
>> >> Guys, can someone chime in and give more light on 3,4,7,8 questions?
>> >>
>> >>
>> >> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson <
>> >> [email protected]>
>> >> wrote:
>> >>
>> >> > Hey everyone,
>> >> > I attended the Alpha 2 update yesterday and was quite pleased to see
>> the
>> >> > progress on things so far. So first, congratulations to everyone on
>> the
>> >> > work being put in and thank you to Val and Kseniya for running
>> >> yesterday's
>> >> > event.
>> >> >
>> >> > I asked a few questions after the webinar which Val had some answers
>> to
>> >> but
>> >> > suggested posting here as some of them are not things that have been
>> >> > thought about yet or no plans exist around it at this point.
>> >> >
>> >> > I'll put all of them here and if necessary we can break into
>> >> > different
>> >> > threads after.
>> >> >
>> >> >    1. Schema change - does that include the ability to change the
>> types
>> >> of
>> >> >    fields/columns?
>> >> >       1. Val's answer was yes with some limitations but those are
>> >> > not
>> >> well
>> >> >       defined yet. He did mention that something like some kind of
>> >> > transformer
>> >> >       could be provided for doing the conversion and I would second
>> >> this,
>> >> > even
>> >> >       for common types like int to long being able to do a custom
>> >> > conversion will
>> >> >       be immensely valuable.
>> >> >    2. Will the new guaranteed consistency between APIs also mean SQL
>> >> will
>> >> >    gain transaction support?
>> >> >       1. I believe the answer here was yes but perhaps someone else
>> may
>> >> >       want to weigh in to confirm
>> >> >    3. Has there been any decision about how much of Calcite will be
>> >> exposed
>> >> >    to the client? When using thick clients, it'll be hugely
>> >> > beneficial
>> >> to
>> >> > be
>> >> >    able to work with Calcite APIs directly to provide custom rules
>> >> > and
>> >> >    optimisations to better suit organisation needs
>> >> >    1. We currently use Calcite ourselves and have a lot of custom
>> rules
>> >> and
>> >> >       optimisations and have slowly pushed more of our queries to
>> >> > Calcite that we
>> >> >       then push down to Ignite.
>> >> >       2. We Index into Solr and use the Solr indices and others to
>> >> >       fulfill over all queries with Ignite just being one of the
>> >> > possible storage
>> >> >       targets Calcite pushes down to. If we could get to the calcite
>> >> > API from an
>> >> >       Ignite thick client, it would enable us to remove a layer of
>> >> > abstraction
>> >> >       and complexity and make Ignite our primary that we then link
>> >> > with Solr and
>> >> >       others to fulfill queries.
>> >> >    4. Will the unified storage model enable different versions of
>> >> Ignite to
>> >> >    be in the cluster when persistence is enabled so that rolling
>> >> restarts
>> >> > can
>> >> >    be done?
>> >> >    1. We have to do a strange dance to perform Ignite upgrades
>> >> > without
>> >> >       downtime because pods/nodes will fail to start on version
>> mismatch
>> >> > and if
>> >> >       we get that dance wrong, we will corrupt a node's data. It
>> >> > will
>> >> make
>> >> >       admin/upgrades far less brittle and error prone if this was
>> >> possible.
>> >> >    5. Will it be possible to provide a custom cache store still and
>> will
>> >> >    these changes enable custom cache stores to be queryable from
>> >> > SQL?
>> >> >    1. Our Ignite usage is wide and complex because we use KV, SQL
>> >> > and
>> >> other
>> >> >       APIs. The inconsistency of what can and can't be used from one
>> >> API to
>> >> >       another is a real challenge and has forced us over time to
>> >> > stick
>> >> > to one API
>> >> >       and write alternative solutions outside of Ignite. It will
>> >> > drastically
>> >> >       simplify things if any CacheStore (or some new equivalent)
>> >> > could
>> >> > be plugged
>> >> >       in and be made accessible to SQL (and in fact all other APIs)
>> >> without
>> >> >       having to load all the data from the underlying CacheStore
>> >> > first
>> >> > into memory
>> >> >    6. This question wasn't mine but I was going to ask it as well:
>> What
>> >> >    will happen to the Indexing API since H2 is being removed?
>> >> >       1. As I mentioned above, we Index into Solr, in earlier
>> >> > versions
>> >> of
>> >> >       our product we used the indexing SPI to index into Lucene on
>> >> > the
>> >> > Ignite
>> >> >       nodes but this presented so many challenges we ultimately
>> >> > abandoned it and
>> >> >       replaced it with the current Solr solution.
>> >> >       2. Lucene indexing was ideal because it meant we didn't have
>> >> > to
>> >> >       re-invent Solr or Elasticsearch's sharding capabilities, that
>> was
>> >> > almost
>> >> >       automatic with Ignite only giving you the data that was meant
>> for
>> >> the
>> >> >       current node.
>> >> >       3. The Lucene API enabled more flexibility and removed a
>> >> > network
>> >> >       round trip from our queries.
>> >> >       4. Given Calcite's ability to support custom SQL functions,
>> >> > I'd
>> >> love
>> >> >       to have the ability to define custom functions that Lucene was
>> >> > answering
>> >> >    7. What impact does RAFT now have on conflict resolution, off the
>> >> top of
>> >> >    my head there are two cases
>> >> >       1. On startup after a split brain Ignite currently takes an
>> >> "exercise
>> >> >       for the reader" approach and dumps a log along the lines of
>> >> >
>> >> > >    1. BaselineTopology of joining node is not compatible with
>> >> > >       BaselineTopology in the cluster.
>> >> > >    1. Branching history of cluster BlT doesn't contain branching
>> point
>> >> > >       hash of joining node BlT. Consider cleaning persistent
>> >> > > storage
>> >> of
>> >> > the node
>> >> > >       and adding it to the cluster again.
>> >> > >
>> >> >    1. This leaves you with no choice except to take one half and
>> >> manually
>> >> >       copy, write data back over to the other half then destroy the
>> bad
>> >> > one.
>> >> >       2. The second case is conflicts on keys, I
>> >> >       beleive CacheVersionConflictResolver and manager are used
>> >> >       by GridCacheMapEntry which just says if use old value do this
>> >> > otherwise use
>> >> >       newVal. Ideally this will be exposed in the new API so that
>> >> > one
>> >> can
>> >> >       override this behaviour. The last writer wins approach isn't
>> >> always
>> >> > ideal
>> >> >       and the semantics of the domain can mean that what is consider
>> >> > "correct" in
>> >> >       a conflict is not so for a different domain.
>> >> >    8. This is last on the list but is actually the most important
>> >> > for
>> us
>> >> >    right now as it is an impending and growing risk. We allow
>> customers
>> >> to
>> >> >    create their own tables on demand. We're already using the same
>> cache
>> >> > group
>> >> >    etc for data structures to be re-used but now that we're getting
>> >> > to
>> >> >    thousands of tables/caches our startup times are sometimes
>> >> unpredictably
>> >> >    long - at present it seems to depend on the state of the
>> cache/table
>> >> > before
>> >> >    the restart but we're into the order of 5 - 7 mins and steadily
>> >> > increasing
>> >> >    with the growth of tables. Are there any provisions in Ignite 3
>> >> > for
>> >> >    ensuring startup time isn't proportional to the number of
>> >> tables/caches
>> >> >    available?
>> >> >
>> >> >
>> >> > Those are the key things I can think of at the moment. Val and
>> >> > others
>> >> I'd
>> >> > love to open a conversation around these.
>> >> >
>> >> > Regards,
>> >> > Courtney Robinson
>> >> > Founder and CEO, Hypi
>> >> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>> >> >
>> >> > <https://hypi.io>
>> >> > https://hypi.io
>> >> >
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >> Andrey V. Mashenkov
>> >>
>> >
>>
>


-- 

Best regards,
Ivan Pavlukhin

Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Reply via email to