Hey Flavio!

Working on the dev list is great, whatever you prefer.

On 12/03/2015 12:32 AM, Flavio Pompermaier wrote:
    - is it better to add a specific operator for repeated columns (e.g.
    in/contains) or modify equal and not-equal to be usable also on repeated
    column?

I think it is better to add an in/contains operator because it seems like it fits the use cases better and is pretty much the basis for an equivalence operator.

A lot of nesting is used to represent denormalized relationships, like a session table with a list of actions that make up a session. An in or contains operation is good for those situations, where you want to find sessions with a certain action and then see what other actions are there.

Equivalence (unordered) would then be a combination of a contains query with all of the required elements and a size ([a,b] contains a and b and is size 2) and equality would be an ordered contains query with a size. So implementing contains is the first step.

    - do you think that PARQUET-35 is necessary to proceed with contains?

No, I think this is something we can do as needed. PARQUET-35 seems too general.

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to