Re: Multi-tenancy and caching issues

Romain Manni-Bucau Tue, 09 Jan 2024 02:58:56 -0800

Don't have everything ready for spring-data but had something like that in
mind:



public class RoutedEMFConf {
    @Bean
    @Primary
    LocalContainerEntityManagerFactoryBean
mainEntityManagerFactory(final Tenant tenant, final ApplicationContext
context) {
        final var emfs = findDelegates(context); // can be other beans
with qualifiers
        final var routedEmf =
EntityManagerFactory.class.cast(Proxy.newProxyInstance(
                RoutedEMFConf.class.getClassLoader(),
                // opt: use SessionFactoryImplementor.class if you
need hibernate internals
                new Class<?>[]{EntityManagerFactory.class, Marking.class},
                (proxy, method, args) -> {
                    switch (method.getName()) {
                        case "equals":
                            return args[0] instanceof Marking; //
assume there is a single one per app, otherwise complete the impl
                        case "hashCode":
                            return 1;
                        default:
                            try {
                                final var id = tenant.get();
                                return
method.invoke(requireNonNull(emfs.get(id), () -> "No emf for '" + id +
"'"), args);
                            } catch (final InvocationTargetException ite) {
                                throw ite.getTargetException();
                            }
                    }
                }
        ));
        return new LocalContainerEntityManagerFactoryBean() {
            @Override
            protected EntityManagerFactory
createNativeEntityManagerFactory() throws PersistenceException {
                return routedEmf;
            }
        };
    }

    private Map<String, EntityManagerFactory> findDelegates(final
ListableBeanFactory lbf) {
        return Stream.of(lbf.getBeanNamesForType(EntityManagerFactory.class))
                .filter(it -> !"mainEntityManagerFactory".equals(it))
                .collect(toMap(identity(), k -> lbf.getBean(k,
EntityManagerFactory.class)));
    }

    public interface Marking {}

    // modelize the tenant lookup but can be a class, interface is not
always needed
    public interface Tenant extends Supplier<String> {
    }
}

Side note: the delegate must have a valid name (likely make it a spring
extension registering beans from your conf or "properties" models).
The missing part is mainly the Tenant impl but guess you already have
something for that ;) - I assume some security context and meta for login.

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>


Le mar. 9 janv. 2024 à 11:28, Francesco Chicchiriccò <ilgro...@apache.org>
a écrit :

> Thank Romain, I share your considerations and concerns below, and also
> agree that EMF routing is the way to go.
>
> I probably need to tune my current exploration to let evolve what we
> currently have in Syncope towards proper EMF routing.
>
> Do you have any sample I could follow about that?
>
> Regards.
>
> On 09/01/24 10:51, Romain Manni-Bucau wrote:
> > Hi Francesco,
> >
> > While you have an EMF router you don't have pitfall 4, it only happens if
> > your routing is done at datasource level but it also means you have way
> > more side effects and you start to loose the hability to tune per tenant
> (a
> > common pattern is to tune the cache per tenant "size"/usage, there all
> > would be shared, not isolated so no real way to handle anything there).
> >
> > Note: having routed caches can make it work somehow but will need a lot
> of
> > reimplementation of the cache whereas it is free when using a routed emf.
> > It can be faked with PartitionedDataCache overriding the key name
> > (appending the tenant) but in terms of supervision I fear it will be way
> > harder and I'm not sure it would be very consummable for people (you end
> up
> > making the leak risk higher for users by design and you don't get any
> > benefit from that - you don't reduce the overhead, you don't reduce the
> > pool size etc which are at another level).
> >
> > In terms of spring-data integration there is also no link, just @Bean EMF
> > routedEmf() and you'll get it working transparently while a tx - cache
> > scope of spring - is for a single tenant.
> >
> > Hope I'm not missing something "key" ;).
> >
> > Romain Manni-Bucau
> > @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> > <https://rmannibucau.metawerx.net/> | Old Blog
> > <http://rmannibucau.wordpress.com> | Github <
> https://github.com/rmannibucau> |
> > LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le mar. 9 janv. 2024 à 10:32, Francesco Chicchiriccò <
> ilgro...@apache.org>
> > a écrit :
> >
> >> Hi Romain,
> >> see my replies embedded below.
> >>
> >> Regards.
> >>
> >> On 08/01/24 17:43, Romain Manni-Bucau wrote:
> >>> Hi Francesco,
> >>>
> >>> Normally if you have one EMF per tenant there is no leak between them
> >> since the cache instance is stored in the EMF - used that approach in
> TomEE.
> >>
> >> As I am saying below, this is what we have already in Syncope.
> >>
> >> My company is also supporting customers heavily using this particular
> >> feature: it works, I have no issues with that.
> >> Someone is also building a SaaS solution on top of that, so runtime
> tenant
> >> addition and removal is also fine.
> >>
> >> I am exploring this different approach because it would allow to
> introduce
> >> Spring Data JPA, which could have some benefits - see
> >> https://issues.apache.org/jira/browse/SYNCOPE-1799
> >>
> >>> You can check it in
> >> org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf
> >> which should be different.
> >>
> >> Thanks for the pointer.
> >>
> >>> So overall if there is a leak it is likely that it leaks accross
> >> transactions or some spring cache level.
> >>
> >> I think that things are more subtle: consider the following use case.
> >>
> >> We have MyEntity with String @Id.
> >>
> >> Suppose we have two tenants: A and B.
> >>
> >> 1. Tenant A will make a REST call which creates a MyEntity instance with
> >> key "key1" under the db for A.
> >>
> >> 2. Tenant A will make another REST call which looks for the newly
> created
> >> MyEntity instance via:
> >>
> >> entityManager.find(MyEntity.class, "key1");
> >>
> >> 3. Tenant B makes the same call as (1) with the same key "key1": all is
> >> fine, a new row is created under the db for B.
> >>
> >> 4. Tenant B makes the same call as (2) with the same key "key1": if not
> >> already evicted, entityManager will return the MyEntity instance for
> Tenant
> >> A from the cache.
> >>
> >> I need to avoid the pitfalls from (4).
> >>
> >>> Side note: the datasource routing pattern is useless if you have an
> >> entity manager routing pattern and only use JPA to do database work,
> both
> >> will more easily conflict than help.
> >>
> >> The idea is not to have an entity manager routing pattern, rather to
> have
> >> a cache routing patter on the single entity manager factory; or just to
> >> configure some predefined partitions.
> >>
> >>> If you still want to plug the datacase (query cache) configuration in
> >> the jpa properties can take a custom fully qualified name too.
> >>> Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <
> >> ilgro...@apache.org>
> >>> a écrit :
> >>>
> >>>> Hi there,
> >>>> at Syncope we have been implementing multi-tenancy by relying on
> >> something
> >>>> like:
> >>>>
> >>>> * 1 data source per tenant
> >>>> * 1 entity manager factory per tenant
> >>>> * 1 transaction manager per tenant
> >>>> * etc
> >>>>
> >>>> So far so good.
> >>>>
> >>>> Now I am experimenting a different approach similar to [1], e.g.
> >>>>
> >>>> * 1 low-level data source per tenant
> >>>> * 1 data source extending Spring's AbstractRoutingDataSource using the
> >>>> value of a ThreadLocal variable as lookup key
> >>>> * 1 single entity manager factory configured with the routing data
> >> source
> >>>> * 1 single transaction manager
> >>>> * etc
> >>>>
> >>>> It mostly works but I am having caching issues with concurrent
> >> operations
> >>>> working on different tenants, so I was wondering: how can I extend the
> >>>> various OpenJPA (query, data, L1, L2, every one) caches to hold back
> >>>> different actual instances per tenant and to use the appropriate one
> >>>> depending on the same ThreadLocal value I have already used above for
> >> data
> >>>> sources?
> >>>>
> >>>> Thanks in advance.
> >>>> Regards.
> >>>>
> >>>> [1] https://github.com/Cepr0/sb-multitenant-db-demo
>
>
> --
> Francesco Chicchiriccò
>
> Tirasa - Open Source Excellence
> http://www.tirasa.net/
>
> Member at The Apache Software Foundation
> Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
> http://home.apache.org/~ilgrosso/
>
>

Re: Multi-tenancy and caching issues

Reply via email to