Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table

Christian Beikov Sat, 30 Jun 2018 09:19:23 -0700

I like the idea of the regex filter, might be cool to have somethinglike that in general for all adapters, but it's fine if you do it justfor ES now. I guess you are considering include and exclude patternparameters?

I'm more for a mode parameter and not let the user decide the nameexplicitly. Either the types or the index names will have to havemeaningful unique names or the user will have to map certain indexes toa different schema. IMO that's a good solution.



Mit freundlichen Grüßen,
------------------------------------------------------------------------
*Christian Beikov*
Am 30.06.2018 um 16:43 schrieb Andrei Sereda:

Christian / Michael,

Can you please weight-in for your preferred solution and I'll implement it.

One more question. Sometimes it is nice to be able to filter (limit)
indexes (tables) exposed by calcite. Say my cluster has 10 indexes but I
want user to query only one. Would you be opposed if I add configuration
parameter which allows to specify a (eg. regexp) filter for ES indexes ?


On Fri, Jun 29, 2018 at 11:17 PM Andrei Sereda <[email protected]> wrote:

That's a reasonable alternative.

On Fri, Jun 29, 2018 at 7:57 PM Julian Hyde <[email protected]> wrote:

Maybe there could be a separator char as one of the adapter’s parameters.
People should choose a value, say ‘$’ or ‘#’, that is legal in an unquoted
SQL identifier but does not occur in any of their index or type names.

If not specified, the adapter would end up in a simple mode, say looking
for indexes first, then looking for types, and people would need to make
sure indexes and types have distinct names. After the transition to
single-type indexes, people could stop using the parameter.

Julian

On Jun 29, 2018, at 4:43 PM, Andrei Sereda <[email protected]> wrote:

That's a valid point. Then user would define a different pattern like
"i$index_t$type" for his cluster.

I think  we should first answer wherever such scenarios should be

supported

by calcite (given that they're already deprecated by the vendor). If

yes,

what should be collision strategy ? User defined pattern like above or
failure or auto generated name ?

On Fri, Jun 29, 2018, 19:14 Julian Hyde <[email protected]> wrote:

In elastic (index/type) pair is guaranteed to be unique therefore
"${index}_${type}" will be also unique (as string). This is only

necessary

when we have several types per index. Valid question is wherever user
should be allowed such flexibility.

Uniqueness is not my concern.

Suppose there is an index called "x_y" with a type called "z", and
another index called "x" with a type called "y_z". If I write "x_y_z"
it's not clear how it should be broken into index/type.


On Fri, Jun 29, 2018 at 3:15 PM, Andrei Sereda <[email protected]>

wrote:

Can you show how those examples affect SQL against the ES adapter

and/or

how they affect JSON models?

The discussion is how to properly bridge (index/type) concept from ES

into

relational world. Proposal to use placeholders ($index / $type)

affects

only how table is named in calcite. They're not used as SQL literals.

IE

it

affects only configuration phase of the schema.
Pretty much we're doing string/replace to derive table name from
($index/$type).

You seem to be using '_' as a separator character. Are we sure that
people will never use it in index or type name? Separator characters
often cause problems.

In elastic (index/type) pair is guaranteed to be unique therefore
"${index}_${type}" will be also unique (as string). This is only

necessary

when we have several types per index. Valid question is wherever user
should be allowed such flexibility.



On Fri, Jun 29, 2018 at 2:19 PM Julian Hyde <[email protected]> wrote:

Andrei,

I'm not an ES user so I don't fully understand this issue, but my two
cents anyway...

Can you show how those examples affect SQL against the ES adapter
and/or how they affect JSON models?

You seem to be using '_' as a separator character. Are we sure that
people will never use it in index or type name? Separator characters
often cause problems.

Julian




On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda <[email protected]>

wrote:

I agree there should be a configuration option. How about the

following

approach.

Expose both variables ${index} and ${type} in configuration (JSON)

and

user

will use them to generate table name in calcite schema.

Example
"table_name": "${type}" // current
"table_name": "${index}" // new (default?)
"table_name": "${index}_${type}" // most generic. supports multiple

types

per index





On Fri, Jun 29, 2018 at 9:26 AM Michael Mior <[email protected]>

wrote:

I think it sounds like you and Andrei are in a good position to

tackle

this

one so I'm happy to have you both work on whatever solution you

think is

best.

--
Michael Mior
[email protected]



Le ven. 29 juin 2018 à 04:19, Christian Beikov <

[email protected]

a écrit :

IMO the best solution would be to make it configurable by

introducing

"table_mapping" config with values

  * type - every type in the known indices is mapped as table
  * index - every known index is mapped as table

We'd probably also need a "type_field" configuration for defining

which

field to use for the type determination as one of the possible

future

ways to do things is to introduce a custom field:

https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2

We already detect the ES version, so we can set a smart default

for

this

setting. Let's make the index config param optional.

  * When no index is given, we discover indexes, the default for
    "table_mapping" then is "index"
  * When index is given, the we only discover types according to

the

    "type_field" configuration and the default for "table_mapping"

is

"type"

This would also allow to discover indexes but still use "type" as
"table_mapping".

What do you think?

Mit freundlichen Grüßen,

------------------------------------------------------------------------

*Christian Beikov*
Am 29.06.2018 um 02:41 schrieb Andrei Sereda:

Yes. There is an API to list all indexes / types in elastic. They

can

be

automatically imported into a schema.

What needs to be agreed upon is how to expose those elements in

calcite

schema (naming / behaviour).

1) Many (most?) of setups are single type per index. Natural way

to

name

would be  "elastic.$index" (elastic being schema name). Multiple

indexes

would be under same schema "elastic.index1" "elastic.index2" etc.

2) What if index has several types should they exported as

calcite

tables:

"elastic.$index_type1" "elastic.$index_type2" ?  Or (current

behaviour)

as

"elastic.type1" and "elastic.type2". Or as subschema
"elastic.$index.type1" ?

Now what if one has combination of (1) and (2) ?
Setup (2) is already deprecated (and will be unsupported in next

version)


On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov <

[email protected]>

wrote:

Is there an API to discover indexes? If there is, I'd suggest we

allow a

config option that to make the adapter discover the possible

indexes.

We'd still have to adapt the code a bit, but internally, the

schema

could just keep a cache of type name to index name map and be

able

to

support both scenarios.


Mit freundlichen Grüßen,

------------------------------------------------------------------------

*Christian Beikov*
Am 29.06.2018 um 00:12 schrieb Andrei Sereda:

1) What's the time horizon for the current adapter no longer

working

with these

changes to ES ?
Current adapter will be working for a while with existing

setup.

The

problem is nomenclature and ease of use.

Their new SQL concepts mapping
<

https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html

drops
the notion of ES type (which before was equivalent of RDBMS

table)

and

uses

ES index as new table equivalent (before ES index was equal to

database).

Most users use elastic this way (one type , one index) index ==

table.

Currently calcite requires schema per index. In RDBMS parlance

database

per

table (I'd like to change that).

2) Any guess how complicated it would be to maintain code

paths

for

both

behaviours? I know this is probably really challenging to

estimate,

but

really have no idea of the scope of these changes. Would it

mean

two

different ES adapters?

One can have just a separate calcite schema implementations

(same

adapter /

module) :
1)  LegacySchema (old). Schema can have only one index (but

multiple

types). Type == table in this case.
2)  NewSchema (new). Single schema can have multiple indexes

(type is

dropped). Index == table in this case

3) Do we really need compatibility with the current version of

the

adapter?

IMO this depends on what versions of ES we would lose support

for

and

how

complex it would be for users of the current ES adapter to

make

updates

for

any Calcite API changes.

The issue is not in adapter but how calcite schema exposes

tables.

Should

it expose index as individual table (new), or ES type (old) ?

Andrei.

On Thu, Jun 28, 2018 at 5:23 PM Michael Mior <[email protected]

wrote:

Unfortunately I know very little about ES so I'm not in a

great

position to

asses the impact of these changes. I will say that that legacy
compatibility is great, but maintaining two sets of logic is

always

challenge. A few follow up questions:

1) What's the time horizon for the current adapter no longer

working

with

these changes to ES?

2) Any guess how complicated it would be to maintain code

paths

for

both

behaviours? I know this is probably really challenging to

estimate,

but

really have no idea of the scope of these changes. Would it

mean

two

different ES adapters?

3) Do we really need compatibility with the current version of

the

adapter?

IMO this depends on what versions of ES we would lose support

for

and

how

complex it would be for users of the current ES adapter to

make

updates

for

any Calcite API changes.

Thanks for your continued work on the ES adapter Andrei!

--
Michael Mior
[email protected]



Le jeu. 28 juin 2018 à 12:57, Andrei Sereda <[email protected]

écrit

Hello,

Elastic announced
<

https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html

that they will be deprecating mapping types in ES6 and

indexes

will

be

single-typed only.

Historical analogy <

https://www.elastic.co/blog/index-vs-type>

between

RDBMS and elastic was that index is equivalent to a database

and

type

corresponds to table in that database. In a couple of

releases

(ES6-8)

this

shall not longer be true.

Recent SQL addition
<https://www.elastic.co/blog/elasticsearch-6-3-0-released>

to

elastic

confirms
this trend
<

https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html

Index is equivalent to a table and there are no more ES

types.

I would like to propose to include this logic in Calcite ES

adapter.

IE,

expose each ES single-typed index as a separate table inside

calcite

schema. This is in contrast to  current integration where

schema

can

only

have a single index. Current approach forces you to create

multiple

schemas

to query single-typed indexes (on the same ES cluster).

Legacy compatibility can always be controlled with

configuration

parameters.

Do you agree with such changes ? If yes, would you consider a

PR ?

Regards,
Andrei.

Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table

Reply via email to