Hi Rui This article may answer your question:https://docs.google.com/document/d/1bYYPg3OJvAivTCVf9-2hBq1BT6jS7s64lyuMuM4ATV4/edit#heading=h.qn6yq5t0ot50 <http://docs.google.com/document/d/1bYYPg3OJvAivTCVf9-2hBq1BT6jS7s64lyuMuM4ATV4/edit#heading=h.qn6yq5t0ot50> 中文版:https://mp.weixin.qq.com/s/LvKaj5ytk6imEU5Dc1Sr5Q
> 2020年10月10日 下午9:16,Rui Li <[email protected]> 写道: > > Thanks for pointing me to the RFC! When using Spark to write a table, we > need to launch several Spark jobs, e.g. to search index and tag locations, > workload profiling, etc. Now RFC-13 aims to encapsulate all these in a > single Flink DAG, right? Do we have plans about how to achieve this? > > On Tue, Sep 29, 2020 at 9:40 AM 王** <[email protected]> wrote: > >> Hi Rui >> Thanks for asking, the design for flink integeration can be found here: >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=141724520 >> please ping me if you have any questions. >> >> >> At 2020-09-28 20:43:22, "Rui Li" <[email protected]> wrote: >>> Hello, >>> >>> Very excited to see the on-going efforts for Flink integration. I wonder >>> whether there's a design doc for this feature? I would like to learn more >>> and hopefully to make some contributions. >>> >>> On Fri, Sep 25, 2020 at 6:27 AM nishith agarwal <[email protected]> >> wrote: >>> >>>> Yes, we have some ideas around schema evolution and have discussed with >>>> Balaji before as well. I'm going to put these thoughts down and share >> it on >>>> the cWiki for all of us to jam. Realistically, I don't think we can hit >> in >>>> 0.7.0. We already have a pretty strong list of items for 0.7.0. >>>> >>>> Spark 3 SQL syntax like MERGE will definitely boost usability! >>>> >>>> Thanks, >>>> Nishith >>>> >>>> On Thu, Sep 24, 2020 at 3:22 PM Vinoth Chandar <[email protected]> >> wrote: >>>> >>>>> On schema evolution, Nishith and Balaji were both thinking about this. >>>> May >>>>> be there is a proposal in works? >>>>> I would guess we will not be able to hit it in 0.7.0 though. Maybe by >> the >>>>> end of year/0.8.0? >>>>> >>>>> Tanu, thanks for the kind words! def, if we pull together, we will >> reach >>>>> there sooner. Looking forward to more contributions! :) >>>>> >>>>>> We were actually thinking of moving to Spark 3.0 but thought it’s too >>>>> early with 0.6 release. Is 0.6 not fully tested with Spark 3.0 ? >>>>> That's correct. There is a PR already open for this. We expect this >> to be >>>>> fixed in 0.6.1 shortly and we will unlock spark 3.0 support >>>>> >>>>> 0.7.0 will bring spark 3 SQL syntax like MERGE etc. (Other systems >> that >>>>> have had this, either had an unfair head start or built ahead with >> spark >>>> 3 >>>>> in mind. :)) >>>>> We will close this gap down. >>>>> >>>>> On Wed, Sep 23, 2020 at 6:25 PM Raymond Xu < >> [email protected]> >>>>> wrote: >>>>> >>>>>> +1 on the full schema evolution support. May I know which ticket >> this >>>> is >>>>>> related to? thanks. >>>>>> >>>>>> On Wed, Sep 23, 2020 at 5:20 AM leesf <[email protected]> wrote: >>>>>> >>>>>>> Thanks Vinoth, also we would consider support full schema >>>>> evolution(such >>>>>> as >>>>>>> >>>>>>> drop some fields) of hudi in 0.7.0, since right now hudi follows >> avro >>>>>>> >>>>>>> schema compatibility >>>>>>> >>>>>>> >>>>>>> >>>>>>> tanu dua <[email protected]> 于2020年9月23日周三 下午12:38写道: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks Vinoth. These are really exciting items and hats off to >> you >>>>> and >>>>>>> team >>>>>>> >>>>>>>> in pushing the releases swiftly and improving the framework all >> the >>>>>>> time. I >>>>>>> >>>>>>>> hope someday I will start contributing once I will get free >> from my >>>>>> major >>>>>>> >>>>>>>> deliverables and have more understanding the nitty gritty >> details >>>> of >>>>>>> Hudi. >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> You have mentioned Spark3.0 support in next release. We were >>>> actually >>>>>>> >>>>>>>> thinking of moving to Spark 3.0 but thought it’s too early with >> 0.6 >>>>>>> >>>>>>>> release. Is 0.6 not fully tested with Spark 3.0 ? >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> On Wed, 23 Sep 2020 at 8:25 AM, Vinoth Chandar < >> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>> >>>>>>>>> Hello all, >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Pursuant to our conversation around release planning, I am >> happy >>>> to >>>>>>> share >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> the initial set of proposals for the next minor/major releases >>>>> (minor >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> release ofc can go out based on time) >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> *Next Minor version 0.6.1 (with stuff that did not make it to >>>>>> 0.6.0..) >>>>>>> * >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Flink/Writer common refactoring for Flink >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Small file handling support w/o caching >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Spark3 Support >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Remaining bootstrap items >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Completing bulk_insertV2 (sort mode, de-dup etc) >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Full list here : >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >> https://issues.apache.org/jira/projects/HUDI/versions/12348168 >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> < >> https://issues.apache.org/jira/projects/HUDI/versions/12348168> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> *0.7.0 with major new features * >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> RFC-15: metadata, range index (w/ spark support), bloom index >>>>>>> (eliminate >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> file listing, query pruning, improve bloom index perf) >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> RFC-08: Record Index (to solve global index scalability/perf) >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> RFC-18/19: Clustering/Insert overwrite >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Spark 3 based datasource rewrite (structured streaming >>>> sink/source, >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> DELETE/MERGE) >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Incremental Query on logs (Hive, Spark) >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Parallel writing support >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Redesign of marker files for S3 >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Stretch: ORC, PrestoSQL Support >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Full list here : >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >> https://issues.apache.org/jira/projects/HUDI/versions/12348721 >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Please chime in with your thoughts. If you would like to >> commit >>>> to >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> contributing a feature towards a release, please do so by >> marking >>>>>> *`Fix >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Version/s`* field with that release number. >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Thanks >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> Vinoth >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Cheers, >>> Rui Li >> > > > -- > Best regards! > Rui Li
