Re: Multi-tenancy and caching issues

Francesco Chicchiriccò Mon, 15 Jan 2024 01:31:24 -0800

FYI, I've adopted a similar solution: still a few things to iron, but globally 
it works.


Thank you.
Regards.

On 09/01/24 11:58, Romain Manni-Bucau wrote:

Don't have everything ready for spring-data but had something like that in
mind:


public class RoutedEMFConf {
     @Bean
     @Primary
     LocalContainerEntityManagerFactoryBean
mainEntityManagerFactory(final Tenant tenant, final ApplicationContext
context) {
         final var emfs = findDelegates(context); // can be other beans
with qualifiers
         final var routedEmf =
EntityManagerFactory.class.cast(Proxy.newProxyInstance(
                 RoutedEMFConf.class.getClassLoader(),
                 // opt: use SessionFactoryImplementor.class if you
need hibernate internals
                 new Class<?>[]{EntityManagerFactory.class, Marking.class},
                 (proxy, method, args) -> {
                     switch (method.getName()) {
                         case "equals":
                             return args[0] instanceof Marking; //
assume there is a single one per app, otherwise complete the impl
                         case "hashCode":
                             return 1;
                         default:
                             try {
                                 final var id = tenant.get();
                                 return
method.invoke(requireNonNull(emfs.get(id), () -> "No emf for '" + id +
"'"), args);
                             } catch (final InvocationTargetException ite) {
                                 throw ite.getTargetException();
                             }
                     }
                 }
         ));
         return new LocalContainerEntityManagerFactoryBean() {
             @Override
             protected EntityManagerFactory
createNativeEntityManagerFactory() throws PersistenceException {
                 return routedEmf;
             }
         };
     }

     private Map<String, EntityManagerFactory> findDelegates(final
ListableBeanFactory lbf) {
         return Stream.of(lbf.getBeanNamesForType(EntityManagerFactory.class))
                 .filter(it -> !"mainEntityManagerFactory".equals(it))
                 .collect(toMap(identity(), k -> lbf.getBean(k,
EntityManagerFactory.class)));
     }

     public interface Marking {}

     // modelize the tenant lookup but can be a class, interface is not
always needed
     public interface Tenant extends Supplier<String> {
     }
}

Side note: the delegate must have a valid name (likely make it a spring
extension registering beans from your conf or "properties" models).
The missing part is mainly the Tenant impl but guess you already have
something for that ;) - I assume some security context and meta for login.

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>


Le mar. 9 janv. 2024 à 11:28, Francesco Chicchiriccò <[email protected]>
a écrit :

Thank Romain, I share your considerations and concerns below, and also
agree that EMF routing is the way to go.

I probably need to tune my current exploration to let evolve what we
currently have in Syncope towards proper EMF routing.

Do you have any sample I could follow about that?

Regards.

On 09/01/24 10:51, Romain Manni-Bucau wrote:

Hi Francesco,

While you have an EMF router you don't have pitfall 4, it only happens if
your routing is done at datasource level but it also means you have way
more side effects and you start to loose the hability to tune per tenant

(a

common pattern is to tune the cache per tenant "size"/usage, there all
would be shared, not isolated so no real way to handle anything there).

Note: having routed caches can make it work somehow but will need a lot

of

reimplementation of the cache whereas it is free when using a routed emf.
It can be faked with PartitionedDataCache overriding the key name
(appending the tenant) but in terms of supervision I fear it will be way
harder and I'm not sure it would be very consummable for people (you end

up

making the leak risk higher for users by design and you don't get any
benefit from that - you don't reduce the overhead, you don't reduce the
pool size etc which are at another level).

In terms of spring-data integration there is also no link, just @Bean EMF
routedEmf() and you'll get it working transparently while a tx - cache
scope of spring - is for a single tenant.

Hope I'm not missing something "key" ;).

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <

https://github.com/rmannibucau> |

LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<

https://www.packtpub.com/application-development/java-ee-8-high-performance



Le mar. 9 janv. 2024 à 10:32, Francesco Chicchiriccò <

[email protected]>

a écrit :

Hi Romain,
see my replies embedded below.

Regards.

On 08/01/24 17:43, Romain Manni-Bucau wrote:

Hi Francesco,

Normally if you have one EMF per tenant there is no leak between them

since the cache instance is stored in the EMF - used that approach in

TomEE.

As I am saying below, this is what we have already in Syncope.

My company is also supporting customers heavily using this particular
feature: it works, I have no issues with that.
Someone is also building a SaaS solution on top of that, so runtime

tenant

addition and removal is also fine.

I am exploring this different approach because it would allow to

introduce

Spring Data JPA, which could have some benefits - see
https://issues.apache.org/jira/browse/SYNCOPE-1799

You can check it in

org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf
which should be different.

Thanks for the pointer.

So overall if there is a leak it is likely that it leaks accross

transactions or some spring cache level.

I think that things are more subtle: consider the following use case.

We have MyEntity with String @Id.

Suppose we have two tenants: A and B.

1. Tenant A will make a REST call which creates a MyEntity instance with
key "key1" under the db for A.

2. Tenant A will make another REST call which looks for the newly

created

MyEntity instance via:

entityManager.find(MyEntity.class, "key1");

3. Tenant B makes the same call as (1) with the same key "key1": all is
fine, a new row is created under the db for B.

4. Tenant B makes the same call as (2) with the same key "key1": if not
already evicted, entityManager will return the MyEntity instance for

Tenant

A from the cache.

I need to avoid the pitfalls from (4).

Side note: the datasource routing pattern is useless if you have an

entity manager routing pattern and only use JPA to do database work,

both

will more easily conflict than help.

The idea is not to have an entity manager routing pattern, rather to

have

a cache routing patter on the single entity manager factory; or just to
configure some predefined partitions.

If you still want to plug the datacase (query cache) configuration in

the jpa properties can take a custom fully qualified name too.

Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <

[email protected]>

a écrit :

Hi there,
at Syncope we have been implementing multi-tenancy by relying on

something

like:

* 1 data source per tenant
* 1 entity manager factory per tenant
* 1 transaction manager per tenant
* etc

So far so good.

Now I am experimenting a different approach similar to [1], e.g.

* 1 low-level data source per tenant
* 1 data source extending Spring's AbstractRoutingDataSource using the
value of a ThreadLocal variable as lookup key
* 1 single entity manager factory configured with the routing data

source

* 1 single transaction manager
* etc

It mostly works but I am having caching issues with concurrent

operations

working on different tenants, so I was wondering: how can I extend the
various OpenJPA (query, data, L1, L2, every one) caches to hold back
different actual instances per tenant and to use the appropriate one
depending on the same ThreadLocal value I have already used above for

data

sources?

Thanks in advance.
Regards.

[1] https://github.com/Cepr0/sb-multitenant-db-demo


--
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

Member at The Apache Software Foundation
Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
http://home.apache.org/~ilgrosso/

Re: Multi-tenancy and caching issues

Reply via email to