I would say that the plan you provided is the physical plan but this does not change the point that I was trying to make, which is that you have all the necessary information in the plan.
Andrei>Does it mean one has to manually keep mapping between EXPR and ITEM somewhere ? I would say yes. I am not sure if you really need to keep the mapping between EXPR and ITEM or just the field names but certainly it is up to you to store the information that you need somewhere. The right place I guess would be org.apache.calcite.adapter.elasticsearch.ElasticsearchRel.Implementor. Note that you could make also Implementor#visitChild method return a result instead of being void and add there necessary information. Andrei>Is there a better way ? An alternative solution (not sure if it is better) would be to put necessary information in the Elastic operators when moving from the logical plan to the physical plan by having custom rules. This would require possibly many changes so I guess in the end it is isn't worth it. Andrei>In Sort Rel (ElasticsearchSort). How to know that sort0=[$2] (EXPR$2) is sort on ITEM($0, 'c') ? Sorting is position based so you just have to know that field c is in position $2. Again you can exploit information that you added in Implementor. Στις Σάβ, 29 Δεκ 2018 στις 7:00 μ.μ., ο/η Andrei Sereda <[email protected]> έγραψε: > Let’s take an example of a simple query: > > -- note _id fieldselect _MAP['a'], _MAP['_id'] from elastic order by > _MAP['c'] > > Logical plan is as follows: > > ElasticsearchToEnumerableConverter > ElasticsearchSort(sort0=[$2], dir0=[ASC]) > ElasticsearchProject(EXPR$0=[ITEM($0, 'a')], EXPR$1=[ITEM($0, > '_id')], EXPR$2=[ITEM($0, 'c')]) > ElasticsearchTableScan(table=[[elastic, foo]]) > > Questions > > 1. In ElasticsearchEnumerators > < > https://github.com/asereda-gs/calcite/blob/elastic-idfield/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchEnumerators.java#L46 > > > I need to know if requested field is a meta field (_id) or not (to > correctly fetch it from payload). Does it mean one has to manually keep > mapping between EXPR and ITEM somewhere ? Is there a better way ? > 2. In Sort Rel (ElasticsearchSort). How to know that sort0=[$2] (EXPR$2) > is sort on ITEM($0, 'c') ? Mapping above (1) will help. > > Thanks for your help. > > On Sat, Dec 29, 2018 at 3:08 AM Stamatis Zampetakis <[email protected]> > wrote: > > > I think a concrete example of a query with its respective logical plan > > could clarify a few things. I would expect that the logical plan contains > > all the necessary information to decide which fields need to be fetched > > from Elastic without requiring additional aliases. If that information is > > lost when passing to the physical plan then we could possibly add some > > rules and/or operators to ensure this doesn't happen. > > > > Στις Παρ, 28 Δεκ 2018 στις 11:37 μ.μ., ο/η Andrei Sereda > <[email protected] > > > > > έγραψε: > > > > > Other than that it seems to me _MAP[‘_id’] will become a RexCall > similar > > to > > > ITEM($0, ‘_ID’) so it appears to me that you can retrieve the field > name > > by > > > implementing a RexVisitor. > > > > > > Yes. That’s what I’m doing (see isItem > > > < > > > > > > https://github.com/apache/calcite/pull/982/files#diff-d1419ede28af115c0b45d6e02828cb6fR85 > > > > > > > ) > > > > > > I don’t see clearly why you need to set explicitly aliases but probably > > is > > > because I am missing implementation details related with the adapter > > > itself. > > > > > > For physical plan I get a list of expressions / projections (via > > > RelDataType > > > / getRowType) but at some point I need to know if (for example) EXPR$0 > == > > > ITEM($0, '_id') (_id is a system field and needs to be fetched > > differently > > > from payload). RelDataType doesn’t have item information. > > > > > > That’s the only reason I’m manually saving this mapping. Perhaps there > > is a > > > better way to do it ? > > > > > > On Fri, Dec 28, 2018 at 2:08 AM Stamatis Zampetakis <[email protected] > > > > > wrote: > > > > > > > Hi Andrei, > > > > > > > > I tried to have a look in the PR but I am not familiar with the > Elastic > > > > adapter code so I am missing a lot of context. > > > > > > > > In general, I think that relying on the on the field names coming > from > > a > > > > RelDataType it is not safe. As you observed, it can get messed up > quite > > > > easily from many parts of Calcite. > > > > It is usually better to try to reason using the position and/or the > > type > > > of > > > > the field rather than its name. > > > > > > > > Other than that it seems to me _MAP['_id'] will become a RexCall > > similar > > > to > > > > ITEM($0, '_ID') so it appears to me that you can retrieve the field > > name > > > by > > > > implementing a RexVisitor. > > > > I don't see clearly why you need to set explicitly aliases but > probably > > > is > > > > because I am missing implementation details related with the adapter > > > > itself. > > > > > > > > Best, > > > > Stamatis > > > > > > > > Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde < > > [email protected]> > > > > έγραψε: > > > > > > > > > I added comments to the case. I did not look at the PR, sorry. > > > > > > > > > > > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <[email protected]> > > wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > Can somebody pls take a look at the following PR: > > > > > > > > > > > > PR: https://github.com/apache/calcite/pull/982 > > > > > > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755 > > > > > > > > > > > > Specifically, I would like to understand if adding explicit > mapping > > > > > > < > > > > > > > > > > > > > > > https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75 > > > > > > > > > > > > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a > > > correct > > > > > > (and recommended) approach. The reason I need it is because the > > > > original > > > > > > item name (eg. `_MAP['foo.bar']`) is lost after converter > operation > > > (ie > > > > > I'm > > > > > > not able to derive it from `input.getRowType()`) > > > > > > > > > > > > ```sql > > > > > > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and > > > > > 'foo.bar' > > > > > > select _MAP['_id'], _MAP['foo.bar'] from elastic > > > > > > > > > > > > -- Query below preserves attribute information > > > > > > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as > > > > > > \"foo.bar\" from elastic) > > > > > > ``` > > > > > > > > > > > > Many thanks, > > > > > > Andrei. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > >> Hi Andrei, > > > > > >> > > > > > >> Many data management systems have internal columns which the > users > > > may > > > > > be > > > > > >> able to query (see for example Postgres [1]). It doesn't sound > > > > > unnatural to > > > > > >> allow users access these fields. On the other hand such system > > > columns > > > > > do > > > > > >> not appear in SELECT * queries and I suppose they are pruned out > > by > > > > > >> projections early on if they are not used by the query. I guess > > you > > > > > could > > > > > >> take a similar approach in the ElasticSearch adapter. > > > > > >> > > > > > >> Best, > > > > > >> Stamatis > > > > > >> > > > > > >> [1] > > https://www.postgresql.org/docs/current/ddl-system-columns.html > > > > > >> > > > > > >> > > > > > >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda > > > > > <[email protected]> > > > > > >> έγραψε: > > > > > >> > > > > > >>> Hello, > > > > > >>> > > > > > >>> Upon indexing, elastic allocates a unique _id > > > > > >>> < > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html > > > > > >>>> > > > > > >>> for each document (unless specified by the user). Currently > when > > > > > mapping > > > > > >>> between elastic result and calcite return type we query only > > > _source > > > > or > > > > > >>> fields attributes. _id is not part those but exposed at higher > > > level > > > > in > > > > > >>> query response (see SearchHit > > > > > >>> < > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html > > > > > >>>> > > > > > >>> ). > > > > > >>> > > > > > >>> Currently _MAP['foo'] becomes _source.get('foo') (or > > > > > fields.get('foo')). > > > > > >>> > > > > > >>> Do you think we should make _MAP['_id'] available implicitly ? > > > > > >>> > > > > > >>> I have a couple of use-cases where one needs to know document > ID. > > > > > >>> > > > > > >>> Please note this makes sense only for non-aggregate queries. > > > > > >>> > > > > > >>> Regards, > > > > > >>> > > > > > >>> Andrei. > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > > > > >
