H2 couldĀ give us fine granular data access. However, most of our code
performs massive joins to reconstruct fully hydrated thrift objects.
Most of the time we are then only interested in very few properties of
those thrift structs. This applies to internal usage, but also how we
use the API.

I therefore believe we have to improve and refine our domain model in
order to significantly improve the storage situation.

I really liked Maxim's proposal from last year, and I think it is worth
reconsidering: https://docs.google.com/document/d/1myYX3yuofGr8JIzud98x
Xd5mqgpZ8q_RqKBpSff4-WE/edit

Best regards,
Stephan

On Thu, 2017-03-30 at 15:53 -0700, David McLaughlin wrote:
> So it sounds like before we make any decisions around removing the
> work
> done in H2 so far, we should figure out what is remaining to move to
> external storage (or if it's even still a goal).
> 
> I may still play around with reviving the in-memory stores, but will
> separate that work from any goal to remove the H2 layer. Since it's
> motivated by performance, I'd verify there is a benefit before
> submitting
> any review.
> 
> Thanks all for the feedback.
> 
> 
> On Thu, Mar 30, 2017 at 12:08 PM, Bill Farner <wfarnerapa...@gmail.co
> m>
> wrote:
> 
> > Adding some background - there were several motivators to using SQL
> > that
> > come to mind:
> > a) well-understood transaction isolation guarantees leading to a
> > simpler
> > programming model w.r.t. concurrency
> > b) ability to offload storage to a separate system (e.g. Postgres)
> > and
> > scale it separately
> > c) relief of computational burden of performing snapshots and
> > backups due
> > to (b)
> > d) simpler code and operations model due to (b)
> > e) schema backwards compatibility guarantees due to persistence-
> > friendly
> > migration-scripts
> > f) straightforward normalization to facilitate sharing of
> > otherwise-redundant state (I.e. TaskConfig)
> > 
> > The storage overhaul comes with a huge caveat requiring the
> > approach to
> > scheduling rounds to change. I concur that the current model is
> > hostile to
> > offloaded storage, as ~all state must be read every scheduling
> > round. If
> > that cannot be worked around with lazy state or best-effort
> > concurrency
> > (I.e. in-memory caching), the approach is indeed flawed.
> > 
> > On Mar 30, 2017, 10:29 AM -0700, Joshua Cohen <jco...@apache.org>,
> > wrote:
> > > My understanding of the H2-backed stores is that at least part of
> > > the
> > > original rationale behind them was that they were meant to be an
> > > interim
> > > point on the way to external SQL-backed stores which should
> > > theoretically
> > > provide significant benefits w.r.t. to GC (obviously unproven,
> > > especially
> > > at scale).
> > > 
> > > I don't disagree that the H2 stores themselves are problematic
> > > (to say
> > 
> > the
> > > least); do we have evidence that returning to memory based stores
> > > will be
> > > an improvement on that?
> > > 
> > > On Thu, Mar 30, 2017 at 12:16 PM, David McLaughlin <
> > 
> > dmclaugh...@apache.org
> > > wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > I'd like to start a discussion around storage in Aurora.
> > > > 
> > > > I think one of the biggest mistakes we made in migrating our
> > > > storage
> > 
> > to H2
> > > > was deleting the memory stores as we moved. We made a pretty
> > > > big bet
> > 
> > that
> > > > we could eventually make H2/relational databases work. I don't
> > > > think
> > 
> > that
> > > > bet has paid off and that we need to revisit the direction
> > > > we're
> > 
> > taking.
> > > > 
> > > > My belief is that the current H2/MyBatis approach is untenable
> > > > for
> > 
> > large
> > > > production clusters, at least without changing our current
> > 
> > single-master
> > > > architecture. At Twitter we are already having to fight to keep
> > > > GC
> > > > manageable even without DbTaskStore enabled, so I don't see a
> > > > path
> > 
> > forward
> > > > where we could eventually enable that. So far experiments with
> > > > H2
> > 
> > off-heap
> > > > storage have provided marginal (if any) gains.
> > > > 
> > > > Would anyone object to restoring the in-memory stores and
> > > > creating new
> > > > implementations for the missing ones (UpdateStore)? I'd even go
> > 
> > further and
> > > > propose that we consider in-memory H2 and MyBatis a failed
> > > > experiment
> > 
> > and
> > > > we drop that storage layer completely.
> > > > 
> > > > Cheers,
> > > > David
> > > > 

Reply via email to