Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-06 Thread Xuannan Su
Hi all, Thank you for all the reviews and suggestions. I believe all the comments have been addressed. If there are no further comments, I plan to open the voting thread for this FLIP early next week. Best regards, Xuannan On Thu, Sep 7, 2023 at 12:09 AM Jing Ge wrote: > > Hi Xuannan, > > I

[jira] [Created] (FLINK-33054) Align the job execution result fetching timeout in CollectResultFetcher to akka timeout

2023-09-06 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-33054: -- Summary: Align the job execution result fetching timeout in CollectResultFetcher to akka timeout Key: FLINK-33054 URL: https://issues.apache.org/jira/browse/FLINK-33054

Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-09-06 Thread Jane Chan
Hi Tawfik, In response to this issue, we have authored a research paper outlining a > novel strategy named "keyed watermarks" to address data loss and > substantially enhance data processing accuracy, achieving at least 99% > accuracy in most scenarios. > Sounds like a significant improvement!

[jira] [Created] (FLINK-33053) Watcher leak in Zookeeper HA mode

2023-09-06 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-33053: -- Summary: Watcher leak in Zookeeper HA mode Key: FLINK-33053 URL: https://issues.apache.org/jira/browse/FLINK-33053 Project: Flink Issue Type: Bug

[DISCUSS][FLINK-31788][FLINK-33015] Add back Support emitUpdateWithRetract for TableAggregateFunction

2023-09-06 Thread Jane Chan
Hi devs, Recently, we noticed an issue regarding a feature regression related to Table API. `org.apache.flink.table.functions.TableAggregateFunction` provides an API `emitUpdateWithRetract` [1] to cope with updated values, but it's not being called in the code generator. As a result, even if

Re: [DISCUSS] FLIP-358: flink-avro enhancement and cleanup

2023-09-06 Thread Becket Qin
Hi Stephen, I don't think you should compare the DataType with the AvroSchema directly. They are for different purposes and sometimes cannot be mapped in both directions. As of now, the following conversions are needed in Flink format: 1. Avro Schema -> Flink Table Schema (DataType). This is

Re: 来自Jiangchuan.Lee的邮件

2023-09-06 Thread Jane Chan
Hi Jiangchuan, If you already have a JIRA account, you can comment under the issue you're interested in (typically starting with bug fixes or documentation and outlining how you'll do it) and request the committer assign it to you. Best, Jane On Thu, Sep 7, 2023 at 9:45 AM Jiangchuan.Lee

Re: [DISCUSS] FLIP-358: flink-avro enhancement and cleanup

2023-09-06 Thread Becket Qin
Hi Jing, Thanks for the explanation. Since SourceFunction is already deprecated and we are working on > SinkFunction deprecation for 1.19, I would suggest directly > marking InputFormat and OutputFormat as deprecated. Because, once we mark > them as public in one release, users might start to

Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-09-06 Thread liu ron
Hi Tawfik, Fast and slow streaming in distributed scenarios leads to watermark advancing too fast, which leads to lost data and is a headache in Flink. Can't wait to read your research paper! Best, Ron Yun Tang 于2023年9月6日周三 14:46写道: > Hi Tawfik, > > Thanks for offering such a proposal,

来自Jiangchuan.Lee的邮件

2023-09-06 Thread Jiangchuan.Lee
Hi, I want to contribute to Apache Flink. Would you please give me the contributor permission? My JIRA ID is LeeJiangchuan.

Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-09-06 Thread liu ron
Hi, Boonie I'm with Jark on why disable hint is needed if it won't affect security. If users don't need to use hint, then they won't care about it and I don't think it's going to be a nuisance. On top of that, Lookup Join Hint is very useful for streaming jobs, and disabling the hint would result

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-06 Thread Venkatakrishnan Sowrirajan
Hi everyone, Posted a PR (https://github.com/apache/flink/pull/23313) to add nested fields filter pushdown. Please review. Thanks. Regards Venkata krishnan On Tue, Sep 5, 2023 at 10:04 PM Venkatakrishnan Sowrirajan wrote: > Based on an offline discussion with Becket Qin, I added

PyFlink logging

2023-09-06 Thread Őrhidi Mátyás
Hey folks, I'm struggling to find in the code where we set/override the logging configuration? logging.basicConfig(level=logging.INFO) logging.info("I'm printed") env = StreamExecutionEnvironment.get_execution_environment() logging.info("I'm not") Can you give me some pointers? Thanks, Matyas

Re: [NOTICE] Experimental Java 17 support now available on master

2023-09-06 Thread Jing Ge
Hi folks, Sorry to come back to this thread, since I saw FLINK-32327 is already closed and Chesnay announced on 16th June that Flink master branch (at that time, now it should be both 1.18 and master branches) builds and runs with Java 17 out-of-the-box. But according to the feedback from Kurt,

[jira] [Created] (FLINK-33052) codespeed server is down

2023-09-06 Thread Jing Ge (Jira)
Jing Ge created FLINK-33052: --- Summary: codespeed server is down Key: FLINK-33052 URL: https://issues.apache.org/jira/browse/FLINK-33052 Project: Flink Issue Type: Bug Components: Test

Re: [DISCUSS] FLIP-358: flink-avro enhancement and cleanup

2023-09-06 Thread Jing Ge
Hi Becket, Thanks for the clarification. > StreamFormatAdapter is internal and it requires a StreamFormat > implementation for Avro files which does not exist either. > I thought the cases 1-6 described in the FLIP mean there is a StreamFormat implementation for Avro. That was my fault. I

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-06 Thread Jing Ge
Hi Xuannan, I thought FLIP-328 will compete with FLIP-309 while setting the value of the backlog. Understood. Thanks for the hint. Best regards, Jing On Wed, Sep 6, 2023 at 12:12 PM Xuannan Su wrote: > Hi Jing, > > Thank you for the clarification. > > For the use case you mentioned, I believe

[jira] [Created] (FLINK-33051) GlobalFailureHandler interface should be retired in favor of LabeledGlobalFailureHandler

2023-09-06 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created FLINK-33051: -- Summary: GlobalFailureHandler interface should be retired in favor of LabeledGlobalFailureHandler Key: FLINK-33051 URL:

Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-09-06 Thread Bonnie Arogyam Varghese
Hi Liu Ron, To answer your question, Security might not be the main reason for disabling this option but other arguments brought forward by Timo. Let me know if you have any further questions or concerns. On Tue, Sep 5, 2023 at 9:35 PM Bonnie Arogyam Varghese < bvargh...@confluent.io> wrote:

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-06 Thread Venkatakrishnan Sowrirajan
The voting time for [VOTE] FLIP-356: Support Nested Fields Filter Pushdown has passed. I'm closing the vote now. There were 5 +1 votes which were binding and 3 +1 votes which were non-binding:

Re: [DISCUSS] Add config to enable job stop with savepoint on exceeding tolerable checkpoint Failures

2023-09-06 Thread Martijn Visser
Hi dongwoo, Thanks for the proposal. I don't think this is a good idea, especially since there's no guarantee that if a checkpoint has failed, that a savepoint will work. I also don't think it's necessarily a good idea that Flink needs to change because of an external factor: isn't there another

Re: [DISCUSS] Add config to enable job stop with savepoint on exceeding tolerable checkpoint Failures

2023-09-06 Thread Dongwoo Kim
Hi Yanfei, Hangxiang. Thank you for taking the time to review my suggestions. I agree with Hangxiang that simply triggering a savepoint based on repeated checkpoint failures isn't a reasonable approach. Adding a specific condition, like *CHECKPOINT_ASYNC_EXCEPTION* as the reason for the last

[jira] [Created] (FLINK-33050) Prompts user to close when atomicity implementation is not supported

2023-09-06 Thread tartarus (Jira)
tartarus created FLINK-33050: Summary: Prompts user to close when atomicity implementation is not supported Key: FLINK-33050 URL: https://issues.apache.org/jira/browse/FLINK-33050 Project: Flink

Re: [DISCUSS] FLIP-358: flink-avro enhancement and cleanup

2023-09-06 Thread 吴 stephen
Hi Becket, I notice that a new config will introduce to Avro Format and user can input their own schema. Since the user can input their schema , should Avro Format support a validation utils that validate whether the input schema is compatible with table columns? I’m modifying the

Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-09-06 Thread Maximilian Michels
Hey Rui, hey Samrat, I want to ensure this is not just an exercise but has actual benefits for the community. In the past, I've seen that the effort stops half way through, the refactoring gets done with some regressions, but actual alternative implementations based on the new design never

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-06 Thread Biao Geng
+1 for the proposal. Best, Biao Geng Gyula Fóra 于2023年9月6日周三 16:10写道: > @Zhanghao Chen: > > I am not completely sure at this point what this will mean for 2.0 simply > because I am also not sure what that will mean for the operator as well :) > I think this will depend on the compatibility

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-06 Thread Xuannan Su
Hi Jing, Thank you for the clarification. For the use case you mentioned, I believe we can utilize the HybridSource, as updated in FLIP-309[1], to determine the backlog status. For example, if the user wants to process data before time T in batch mode and after time T in stream mode, they can

Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-09-06 Thread Rui Fan
Hi Max, As the FLIP mentioned, we have the plan to add the alternative implementation. First of all, we will develop a generic autoscaler. This generic autoscaler will not have knowledge of specific jobs, and users will have the flexibility to pass the JobAutoScalerContext when utilizing the

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Xintong Song
> > I added the average time metric to the FLIP document. I also included it > for the aggregate (total) across all collectors. But maybe it doesn't make > too much sense as collection times usually differ greatly depending on the > collector. > LGTM Best, Xintong On Wed, Sep 6, 2023 at

[jira] [Created] (FLINK-33048) Project website refresh Light Mode

2023-09-06 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-33048: - Summary: Project website refresh Light Mode Key: FLINK-33048 URL: https://issues.apache.org/jira/browse/FLINK-33048 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-33049) Project website refresh Code Samples Widget

2023-09-06 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-33049: - Summary: Project website refresh Code Samples Widget Key: FLINK-33049 URL: https://issues.apache.org/jira/browse/FLINK-33049 Project: Flink Issue Type:

[jira] [Created] (FLINK-33047) Project website refresh Dark Mode

2023-09-06 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-33047: - Summary: Project website refresh Dark Mode Key: FLINK-33047 URL: https://issues.apache.org/jira/browse/FLINK-33047 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-33046) [FLIP-333] Redesign Apache Flink website

2023-09-06 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-33046: - Summary: [FLIP-333] Redesign Apache Flink website Key: FLINK-33046 URL: https://issues.apache.org/jira/browse/FLINK-33046 Project: Flink Issue Type: New

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-06 Thread Gabor Somogyi
Hi Xingbo, Thanks for your support! I agree that multi-platform in general is a good idea. We start with ARM but this opens the door for all others. G On Wed, Sep 6, 2023 at 10:46 AM Xingbo Huang wrote: > Hi Gabor, > > Thanks a lot for the detailed explanation. Building a test environment

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-06 Thread Xingbo Huang
Hi Gabor, Thanks a lot for the detailed explanation. Building a test environment with multi-platforms is indeed what we have been lacking, not only for M1 users, but also for Windows users. At 1.16, I remember that Azure did not provide the m1 environment, and the github action just planned m1 in

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Gyula Fóra
I added the average time metric to the FLIP document. I also included it for the aggregate (total) across all collectors. But maybe it doesn't make too much sense as collection times usually differ greatly depending on the collector. Gyula On Wed, Sep 6, 2023 at 10:21 AM Xintong Song wrote: >

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Xintong Song
Thank you :) Best, Xintong On Wed, Sep 6, 2023 at 4:17 PM Gyula Fóra wrote: > Makes sense Xintong, I am happy to extend the proposal with the average gc > time metric +1 > > Gyula > > On Wed, Sep 6, 2023 at 10:09 AM Xintong Song > wrote: > > > > > > > Just so I understand correctly, do you

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Gyula Fóra
Makes sense Xintong, I am happy to extend the proposal with the average gc time metric +1 Gyula On Wed, Sep 6, 2023 at 10:09 AM Xintong Song wrote: > > > > Just so I understand correctly, do you suggest adding a metric for > > delta(Time) / delta(Count) since the last reporting ? > >

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-06 Thread Gyula Fóra
@Zhanghao Chen: I am not completely sure at this point what this will mean for 2.0 simply because I am also not sure what that will mean for the operator as well :) I think this will depend on the compatibility guarantees we can provide across Flink major versions in general. We have to look into

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Xintong Song
> > Just so I understand correctly, do you suggest adding a metric for > delta(Time) / delta(Count) since the last reporting ? > .TimePerGc or .AverageTime would make sense. > AverageTime may be a bit nicer :) > Yes, that's what I mean. My only concern is how useful this will be in reality. If

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-06 Thread Gabor Somogyi
Hi Xingbo, > So I guess you want to run ci tests on the m1 > environment, but the current version of miniconda cannot meet this > requirement, so there is a pre-step that must drop python 3.7? In short yes + local wheel build fix on the M1. In a bit more detailed please let me elaborate.

[jira] [Created] (FLINK-33045) format avro-confluent - disable auto-registering schema in registry

2023-09-06 Thread Michael Lorenzi (Jira)
Michael Lorenzi created FLINK-33045: --- Summary: format avro-confluent - disable auto-registering schema in registry Key: FLINK-33045 URL: https://issues.apache.org/jira/browse/FLINK-33045 Project:

[jira] [Created] (FLINK-33044) Reduce the frequency of triggering flush for the disk tier of the tiered storage

2023-09-06 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-33044: - Summary: Reduce the frequency of triggering flush for the disk tier of the tiered storage Key: FLINK-33044 URL: https://issues.apache.org/jira/browse/FLINK-33044 Project:

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Gyula Fóra
Matt Wang, I think the currently exposed info is all that is available through GarbageCollectorMXBean. This FLIP does not aim to introduce a new more granular way of reporting the per collector metrics, that would require a new mechanism and may be a breaking change. We basically want to simply

Re: [DISCUSS] Add config to enable job stop with savepoint on exceeding tolerable checkpoint Failures

2023-09-06 Thread Hangxiang Yu
Hi, Dongwoo. IIUC, you mean using savepoint to store a snapshot to other storage if checkpoints fail multiple times due to some long lasting exceptions of external storage, right ? I think it's better to achieve this by an external tool instead of introducing a config like that: 1. it's not so

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread liu ron
Hi, Gyula Thanks for driving this proposal, GC-related metrics are beneficial for us to profile the root cause, +1 for this proposal. Best, Ron Matt Wang 于2023年9月6日周三 15:24写道: > Hi Gyula, > > +1 for this proposal. > > Do we need to add a metric to record the count of different > collectors?

[jira] [Created] (FLINK-33043) Search is broken on the Flink website

2023-09-06 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-33043: - Summary: Search is broken on the Flink website Key: FLINK-33043 URL: https://issues.apache.org/jira/browse/FLINK-33043 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-33042) Allow trigger flamegraph when task is initializing

2023-09-06 Thread Rui Fan (Jira)
Rui Fan created FLINK-33042: --- Summary: Allow trigger flamegraph when task is initializing Key: FLINK-33042 URL: https://issues.apache.org/jira/browse/FLINK-33042 Project: Flink Issue Type:

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Matt Wang
Hi Gyula, +1 for this proposal. Do we need to add a metric to record the count of different collectors? Now there is only a total count. For example, for G1, there is no way to distinguish whether it is the young generation or the old generation. -- Best, Matt Wang Replied Message

Re: [DISCUSS] FLIP-357: Deprecate Iteration API of DataStream

2023-09-06 Thread Stefan Richter
+1 Thanks, Stefan > On 5. Sep 2023, at 10:45, David Morávek wrote: > > +1 since there is an alternative, more complete implementation available > > Best, > D. > > On Sat, Sep 2, 2023 at 12:07 AM David Anderson > wrote: > >> +1 >> >> Keeping the legacy

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Gyula Fóra
Thanks Xintong! Just so I understand correctly, do you suggest adding a metric for delta(Time) / delta(Count) since the last reporting ? .TimePerGc or .AverageTime would make sense. AverageTime may be a bit nicer :) My only concern is how useful this will be in reality. If there are only (or

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-06 Thread Xingbo Huang
Hi Gyala and Gabor, Pyflink has provided arm64 wheel packages for Apple silicon since 1.16[1]. The use of Miniconda is only related to ci testing and packaging on linux platform, and building mac platform wheels are dependent on cibuildwheel[2]. So I guess you want to run ci tests on the m1

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Xintong Song
Thanks for bringing this up, Gyula. The proposed changes make sense to me. +1 for them. In addition to the proposed changes, I wonder if we should also add something like timePerGc? This would help understand whether there are long pauses, due to GC STW, that may lead to rpc unresponsiveness and

Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-09-06 Thread Yun Tang
Hi Tawfik, Thanks for offering such a proposal, looking forward to your research paper! You could also ask the edit permission for Flink improvement proposals to create a new proposal if you want to contribute this to the community by yourself. [1]

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-06 Thread Gabor Somogyi
Hi Xingbo, *Constraint:* I personally not found any miniconda version which provides arm64 support together with python 3.7. [image: image.png] At the moment I think new platform support means 3.7 drop. I fully to agree with Gyula, if we start now maybe we can release it in half a year however

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-06 Thread Rui Fan
Thanks for the clarification! By default the meterview measures for 1 minute sounds good to me! +1 for this proposal. Best, Rui On Wed, Sep 6, 2023 at 1:27 PM Gyula Fóra wrote: > Thanks for the feedback Rui, > > The rates would be computed using the MeterView class (like for any other > rate

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-06 Thread Gyula Fóra
Hi Xingbo! I think we have to analyze what we gain by dropping 3.7 and upgrading to a miniconda version with a multiarch support. If this is what we need to get Apple silicon support then I think it's worth doing it already in 1.19. Keep in mind that 1.18 is not even released yet so if we delay