[
https://issues.apache.org/jira/browse/FLINK-23911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403000#comment-17403000
]
Ingo Bürk edited comment on FLINK-23911 at 8/23/21, 7:42 AM:
-------------------------------------------------------------
I debugged a little bit more. Actually only the second case you mentioned is a
problem (when no metadata columns remain after the projection). If any metadata
columns are selected, #applyReadableMetadata is called twice and the second
time the projection has been considered correctly. However, if only physical
columns are selected, then #applyReadableMetadata is only called once with all
metadata keys.
-I am also quite surprised that #applyReadableMetadata is called multiple times
on a source. Depending on the implementation this can cause unexpected
behavior, so we should probably document that this method (and those of all
other abilities?) must be idempotent.- (Edit: it's called on different
instances of the source)
-Edit: It seems this might also differ between 1.13 and master? In my 1.13
project I couldn't reproduce the same behavior, but continuing to look into
this-
Edit 2: That was not correct, this just depends on SupportProjectionPushDown
being implemented.
was (Author: airblader):
I debugged a little bit more. Actually only the second case you mentioned is a
problem (when no metadata columns remain after the projection). If any metadata
columns are selected, #applyReadableMetadata is called twice and the second
time the projection has been considered correctly. However, if only physical
columns are selected, then #applyReadableMetadata is only called once with all
metadata keys.
I am also quite surprised that #applyReadableMetadata is called multiple times
on a source. Depending on the implementation this can cause unexpected
behavior, so we should probably document that this method (and those of all
other abilities?) must be idempotent.
-Edit: It seems this might also differ between 1.13 and master? In my 1.13
project I couldn't reproduce the same behavior, but continuing to look into
this-
Edit 2: That was not correct, this just depends on SupportProjectionPushDown
being implemented.
> Projections are not considered when pushing readable metadata into a source
> ---------------------------------------------------------------------------
>
> Key: FLINK-23911
> URL: https://issues.apache.org/jira/browse/FLINK-23911
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Affects Versions: 1.13.2
> Reporter: Ingo Bürk
> Priority: Major
>
> Given a table with a declared schema containing some metadata columns, if we
> select only some of those metadata columns (or none), the interface of
> SupportsReadableMetadata states that the planner will perform the projection
> and only push required metadata keys into the source:
> {quote}The planner will select required metadata columns (i.e. perform
> projection push down) and will call \{@link #applyReadableMetadata(List,
> DataType)} with a list of metadata keys.{quote}
> However, it seems that this doesn't happen, and the planner always applies
> all metadata declared in the schema instead. This can be a problem because
> the source has to do unnecessary work, and some metadata might be more
> expensive to compute than others.
> For reference, SupportsProjectionPushDown can not be used to workaround this
> because it operates only on physical columns, i.e. #applyProjections will
> never be called with a projection for the metadata columns, even if they are
> selected.
> The following test case can be executed to debug into #applyReadableMetadata
> of the values table source:
> {code:java}
> @Test
> def test(): Unit = {
> val tableId = TestValuesTableFactory.registerData(Seq())
> tEnv.createTemporaryTable("T", TableDescriptor.forConnector("values")
> .schema(Schema.newBuilder()
> .column("f0", DataTypes.INT())
> .columnByMetadata("m1", DataTypes.STRING())
> .columnByMetadata("m2", DataTypes.STRING())
> .build())
> .option("data-id", tableId)
> .option("bounded", "true")
> .option("readable-metadata", "m1:STRING,m2:STRING")
> .build())
> tEnv.sqlQuery("SELECT f0, m1 FROM T").execute().collect().toList
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)