Hi Kasper,
i'd happily contribute, actually i already started working on it, but
soon discovered, that there might be a lot of side effects.
So basic ideas i had were:
1. Start with in memory datasets
2. Stream non-rewindable datasets
3. Directy apply filters on every row created.
4. Join first between datasets with filters, in order to prevent
cathesian products.
And further:
5. When joining, avoid loops, but build indexes for the in memory
datasets. Not sure about this one, though.
Feedback is appreciated.
I already started a fork
(https://github.com/tomatophantastico/metamodel) , as soon as it works,
i'll write an email again, before any pull requests.
Best,
Jörg
Am 02.05.16 um 06:56 schrieb Kasper Sørensen:
Hi Jörg,
You're right about the very naive behaviour of that method. It could
_certainly_ use an optimization or two. I can only speak for myself, but I
just never used MetaModel much for joins and thus never gave it much
thought. Looking at the code I'm thinking that we can do much better.
Would you be interested in working on improving this condition? If so I
will happily share insights and ideas on how we can pull it off.
Cheers,
Kasper
2016-05-01 4:12 GMT-07:00 Jörg Unbehauen <
[email protected]>:
Hi all,
we just tried out metamodel with mongodb and tried out a simple join (as
in select * from t1 join t2 on (t1.id = t2.oid) ) between two collections
each containing roughly 10,000 documents. Using a developer setup on a mac,
we did not get a result, as the system was more or less stuck.
A quick examination revealed that
MetaModelHelper.getCarthesianProduct(DataSet[] fromDataSets,
Iterable<FilterItem> whereItems) consumes most of the resources.
This implementation first computes the carthesian product in memory and
than applies filters on it.
I wonder what the rationale behind this implementation is, as it will not
scale well, even for selective joins.
Or am i using Metamodel wrong here, as in: The join should never be
computed by getCarthesianProduct().
The problem appears to me as a general one, i did not supply a code
example.
Best,
Jörg