Hi All,

+1 for solution 2. But don't store rowid as it makes the storage very big
and it gives a very slow performance. Let's go with the current model of SI
which stores till blocklet level. Don't make things complicated by storing
rowid.
Solution 1 makes the scan slower as it needs to construct the complex row
for every row. So it is better to flatten out to get the better scan
performance and storage optimization.

Consider the following way.
*Array: *Flatten out each row and store in multiple rows and store till
blocklet id.
*Struct:* It is up to the user on which element exactly he wants to index.
For example *emp:struct<name: String, address: String>*, in this user can
create separate SI on individual columns like *emp.name <http://emp.name>*
or *emp.address*.
*Map:* Here also we can flatten out the data like Array. But the user
should choose whether he wants the SI on Map key or value. If he wants both
then he can create separate SI.

Regards,
Ravindra.

On Thu, 30 Jul 2020 at 17:35, Ajantha Bhat <[email protected]> wrote:

> Hi David & Indhumathi,
> Storing Array of String as just String column in SI by flattening [with row
> level position reference] can result in slow performance in case of
> * Multiple array_contains() or multiple array[0] = 'x'
> * The join solution mentioned can result in multiple scan (once for every
> complex filter condition) which can slow down the SI performance.
> * Row level SI can slow down SI performance when the filter results huge
> value.
> * To support multiple SI on a single table, complex SI will become row
> level position reference and primitive will become blocklet level position
> reference. Need extra logic /time for join.
> * Solution 2 cannot support struct column SI in the future. So, it cannot
> be a generic solution.
>
> Considering the above points, *solution2 is a very good solution if only
> one filter exist* for complex column. *But not a good solution for all the
> scenarios.*
>
> *So, I have to go with solution1 or need to wait for other people opinions
> or new solutions.*
>
> Thanks,
> Ajantha
>
> On Thu, Jul 30, 2020 at 1:19 PM David CaiQiang <[email protected]>
> wrote:
>
> > +1 for solution2
> >
> > Can we support more than one array_contains by using SI join (like SI on
> > primitive data type)?
> >
> >
> >
> > -----
> > Best Regards
> > David Cai
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>


-- 
Thanks & Regards,
Ravi

Reply via email to