Parallel SQL join on multivalued fields

Piero Scrima Wed, 01 Jul 2020 02:23:00 -0700

Hi,

I don't know if this is the right place for my question, anyway I'll try to
explain the issue here and understand together with you if it's worth
working on it.
I'm working with the parallel sql feature of Solr. Even though , looking at
the documentation, the join seems not supported, the join works. I gave a
look to the code and I understood that it works thanks to the calcite
features, (the framework on top of which is is build parellel sql feature).
My project doesn't need to works with big amount of data, and I think that
the calcite join feature can work well for my use case.
The problems arise when I need to join two multivalued field.
In parallel sql, the JOIN operation on two multivalued field seems to works
matching the two fields as a unique string, so that, for example , a
document with the join field like this :["a","b"]; will match with a
document with a join field exactly equal, like this: ["a","b"]; otherwise
if even only one element is different, they do not match.The right way to
do this should be first to explode the document in more document as much as
the number of element in the multivalued field (cross product on the field)
and then perform the join. I managed to solve the problem using streaming
expression:


innerJoin(
    sort(
        cartesianProduct(

search(census_defence_system,q="*:*",fl="id,defence_system,description,supplier",sort="id
asc",qt="/select",rows="1000"),
          supplier
        ),
    by="supplier asc"
    ),
    sort(
      cartesianProduct(

search(census_components,q="*:*",fl="id,compoenent_name,supplier",sort="id
asc",qt="/select",rows="10000"),
        supplier
        ),
        by="supplier asc"
    ),
  on="supplier"
)
with suplier as a multivalued field. It works very well.
Anyway it would be great if the JOIN of multivalued field performed this
behavior with cartesian product also in parallel sql.
I think this could be a very powerful improvement, apply sql in  a not
normalized collection/table. It could be possible to implement this
feature?  I will be very glad to work on it.

Thank you,

Piero

Parallel SQL join on multivalued fields

Reply via email to