Hey Flavio!
Working on the dev list is great, whatever you prefer.
On 12/03/2015 12:32 AM, Flavio Pompermaier wrote:
- is it better to add a specific operator for repeated columns (e.g.
in/contains) or modify equal and not-equal to be usable also on repeated
column?
I think it is better to add an in/contains operator because it seems
like it fits the use cases better and is pretty much the basis for an
equivalence operator.
A lot of nesting is used to represent denormalized relationships, like a
session table with a list of actions that make up a session. An in or
contains operation is good for those situations, where you want to find
sessions with a certain action and then see what other actions are there.
Equivalence (unordered) would then be a combination of a contains query
with all of the required elements and a size ([a,b] contains a and b and
is size 2) and equality would be an ordered contains query with a size.
So implementing contains is the first step.
- do you think that PARQUET-35 is necessary to proceed with contains?
No, I think this is something we can do as needed. PARQUET-35 seems too
general.
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.